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Abstract 

OA ' Let a be the maximal value such that the product of an n x n a matrix by an n a x n matrix can 

be computed with n 2+ °W arithmetic operations. In this paper we show that a > 0.30298, which 
improves the previous record a > 0.29462 by Coppersmith (Journal of Complexity, 1997). More 
generally, we construct a new algorithm for multiplying an n x n k matrix by an n k x n matrix, for 
any value k ^ 1. The complexity of this algorithm is better than all known algorithms for rectangular 
matrix multiplication. In the case of square matrix multiplication (i.e., for k — 1), we recover exactly 
the complexity of the algorithm by Coppersmith and Winograd (Journal of Symbolic Computation, 
1990). 

These new upper bounds can be used to improve the time complexity of several known algorithms 
\ that rely on rectangular matrix multiplication. For example, we directly obtain a O(n 2 5302 )-time 

^"^ ■ algorithm for the all-pairs shortest paths problem over directed graphs with small integer weights, 

improving over the 0(n 2 575 )-time algorithm by Zwick (JACM 2002), and also improve the time 
q ■ complexity of sparse square matrix multiplication. 

1 Introduction 

Background. Matrix multiplication is one of the most fundamental problems in computer science and 
mathematics. Besides the fact that several computational problems in linear algebra can be reduced to 
the computation of the product of two matrices, the complexity of matrix multiplication also arises as 
a bottleneck in a multitude of other computational tasks (e.g., graph algorithms). The standard method 
for multiplying two n x n matrices uses 0(n 3 ) arithmetic operations. Strassen showed in 1969 that 
this trivial algorithm is not optimal, and gave a algorithm that uses only 0(n 2 808 ) arithmetic operations. 
This has been the beginning of a long story of improvements that lead to the upper bound 0(n 2 376 ) by 
Coppersmith and Winograd [10], which has been further improved to 0(n 2 ' 3727 ) very recently by Vas- 
silevska Williams ETl . Note that all the above complexities refer to the number of arithmetic operations 



involved, but naturally the same upper bounds hold for the time complexity as well when each arithmetic 
operation can be done in negligible time (e.g., in poly(log n) time). 

Finding the optimal value of the exponent of square matrix multiplication is naturally one of the 
most important open problems in algebraic complexity. It is widely believed that the product of two 
n x n matrices can be computed with 0(n 2+e ) arithmetic operations for any constant e > 0. Several 
conjectures, including conjectures about combinatorial structures [ 10 1 and about group theory |0 Eh 
would, if true, lead to this result (see also [JTJ for recent work on these conjectures). Another way 
to interpret this open problem is by considering the multiplication of an n x m matrix by an m x n 
matrix. Suppose that the matrices are defined over a field. For any k > 0, define the exponent of such a 
rectangular matrix multiplication as follows: 

w(l, 1, k) = inf{r G M | C(n, n, [n k \ ) = 0(n T )}, 

where C(n,n, [n k \) denotes the minimum number of arithmetic operations needed to multiply an n x 
\n k \ matrix by an [n k \ x n matrix. Note that, while the value u(l,l,k) may depend on the field 
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under consideration, it is known that it can depend only on the characteristic of the field Il23l . Define 
oj = oj(1, 1, 1) and a = sup{fe | ui(l, 1, k) = 2}. The value oj represents the exponent of square matrix 
multiplication, and the value a essentially represents the largest value such that the product of an n x n a 
matrix by anii a xn matrix can be computed with 0(n 2+e ) arithmetic operations for any constant e. 
Since oj = 2 if and only if a = 1, one possible strategy towards showing that oj = 2 is to give lower 
bounds on a. Coppersmith [8 1 showed in 1982 that a > 0.172. Then, based on the techniques developed 
in IfTOl . Coppersmith [9 1 improved this lower bound to a > 0.29462. This is the best lower bound on a 
known so far. 

Excepting Coppersmith's works on the value a, there have been relatively few algorithms that fo- 
cused specifically on rectangular matrix multiplication. Since it is well known (see, e.g, [ 17]) that mul- 
tiplying an n x n matrix by an n x m matrix, or an m x n matrix by an n x n matrix, can be done 
with the same number of arithmetic operations as multiplying an n x m matrix by an m x n matrix, 
the value oj(1, 1, k) represents the exponent of all these three types of rectangular matrix multiplications. 
Note that, by decomposing the product into smaller matrix products, it is easy to obtain (see, e.g, lfl~7l ) 
the following upper bound: 



Lotti and Romani [17] obtained nontrivial upper bounds on oj(1, 1, k) based on the seminal result by 
Coppersmith ||8] and on early works on square matrix multiplication. Huang and Pan (TBI showed how 
to apply ideas from IfTOl to the rectangular setting and obtained the upper bound oj(1, 1, 2) < 3.333954, 
but this approach did not lead to any upper bound better than CD for k < 1. Ke, Zeng, Han and Pan fT6ll 
further improved Huang and Pan's result to oj(1, 1, 2) < 3.2699, by using again the approach from IfTOl . 
and also reported the upper bounds oj(1, 1,0.8) < 2.2356 and oj(1, 1, 0.5356) < 2.0712, which are 
better than those obtained by £T|). Their approach, nevertheless, did not give any improvement for the 
value of a. 

Besides the fact that a better understanding of u(l,l,k) gives insights into the nature of matrix 
multiplication and ultimately may help showing that oj = 2, fast algorithms for multiplying an n x n k 
matrix by ann fc xn with k / 1 have also a multitude of applications. Typical examples not directly re- 
lated to linear algebra include the construction of fast algorithms for the all-pairs shortest paths problem 
(2l[T9j|29j[33l[34l, the d y namic computation of the transitive closure flUim, finding ancestors IfTTl . 
detecting directed cycles (3jJ, or computing the diameter of a graph ll30l . Rectangular matrix multipli- 
cation has also been used in computational complexity lfl~8l [28l . and to speed-up sparse square matrix 
multiplication [3][T5l[32l or tasks in computational geometry lfl"4l [T5l . Obtaining new upper bounds 
on oj(1, 1, k) would thus reduce the asymptotic time complexity of algorithms in a wide range of areas. 
We nevertheless stress that such improvements are only of theoretical interest, since the huge constants 
involved in the complexity of fast matrix multiplication usually make these algorithms impractical. 

Short description of the approach by Coppersmith and Winograd. The results (9J fT3j fj6j |24j [271 

mentioned above are all obtained by extending the approach by Coppersmith and Winograd [10|. This 
approach is an illustration of a general methodology initiated in the 1970's based on the theory of bi- 
linear and trilinear forms, through which most of the improvements for matrix multiplication have been 
obtained. Informally, the idea is to start with a basic construction (some small trilinear form), and then 
exploit general properties of matrix multiplication (in particular Schonhage's asymptotic sum inequal- 
ity (231) to derive an upper bound on the exponent oj from this construction. The main contributions 
of (Toll consist of two parts: the discovery of new basic constructions and the introduction of strong 
techniques to analyze them. In their paper, Coppersmith and Winograd actually present three algorithms, 
based on three different basic constructions. The first basic construction (Section 6 in ifTOl ) is the sim- 
plest of the three and leads to the upper bound oj < 2.40364. The second basic construction (Section 7 
in IfTOl ). that we will refer in this paper as F q (here q G N is a parameter), leads to the upper bound 
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oj < 2.38719. The third basic construction (Section 8 in iPTOl ) is F q <g) F q , the tensor product of two 
instances of F q , and leads to the improved upper bound oj < 2.375477. 

In view of the last result, it was natural to ask if taking larger tensor powers of F q as the basic 
construction leads to better bounds on oj. The case r = 3 was explicitly mentioned as an open problem 
in |[T0l but did not seem to lead to any improvement. Stothers [24] and Vassilevska Williams ETl 
succeeded in analyzing the fourth tensor product F® A and obtained a better upper bound on oj, the first 
improvement in more that twenty years. Vassilevska Williams further presented a general framework 
that enables a systematic analysis for higher tensor products of the basis construction, and used this 
framework to show that oj < 2.3727, for the basic construction F® s , the best upper bound obtained so 
far. 

The algorithms for rectangular matrix multiplication ||9] [131 [HI already mentioned use a similar 
approach. Huang and Pan [13 ] obtained their improvement on 1,2) by taking the easiest of the 
three construction in [10] and carefully modifying the analysis to evaluate the complexity of rectangular 
matrix multiplication. Ke, Zeng, Han and Pan fl6ll obtained their improvements similarly, but by using 
the second basic construction from ifTOl (the construction F q ) instead, which lead to better upper bounds. 
These approaches, while very natural, do not provide any nontrivial lower bounds on a: the upper 
bounds on oj(1, 1, k) obtained are strictly larger than 2 even for small values of A;. In order to obtain the 
lower bound a > 0.29462, Coppersmith [9] relied on a more complex approach: the basic construction 
considered is still F q , but several instances for distinct values of q are combined together in a subtle way 
in order to keep the complexity of the resulting algorithm small enough (i.e., not larger than n 2+ °W). 

Statement of our results and discussion. In this paper we construct new algorithms for rectangu- 
lar matrix multiplication, by taking the tensor power F q <g> F q as basic construction and analyzing this 
construction in the framework of rectangular matrix multiplication. We use these ideas to prove that 
oj(1, 1, k) = 2 for any k < 0.3029805, as stated in the following theorem. 

Theorem 1.1. For any value k < 0.3029805..., the product of an n x n k matrix by an n k x n matrix 
can be computed with 0(n 2+e ) arithmetic operations for any constant e > 0. 

Theorem O shows that a > 0.30298, which improves the previous record a > 0.29462 by Copper- 
smith. More generally, in the present work we present an algorithm for multiplying an n x n k matrix by 
an n k x n matrix, for any value k. We show that the complexity of this algorithm can be expressed as a 
(nonlinear) optimization problem, and use this formulation to derive upper bounds on oj(1, 1, k). Table[T] 
shows the bounds we obtain for several values of k. The bounds obtained for < k < 1 are represented 
in Figure [T] as well. 

The results of this paper can be seen as a generalization of Coppersmith-Winograd's approach to the 
rectangular setting. In the case of square matrix multiplication (i.e., for k = 1), we recover naturally 
the same upper bound oj(1, 1, 1) < 2.375477 as the one obtained in iTTOl . Let us mention that we can, 
in a rather straightforward way, combine our results with the upper bound oj < 2.3727 by Vassilevska 
Williams [27 ] to obtain slightly improved bounds for k ~ 1. The idea is, very similarly to how Equa- 
tion £[]) was obtained, to exploit the convexity of the function oj(1, 1, k). Concretely, for any fixed value 
< ko < 1, the inequality 

w(l, 1, k) < oj(1, 1, k ) + (w - w(l, 1, ko))^^ 

holds for any k such that ko < k < 1. This enables us to combine an upper bound on cj(1, l,fc ), 
for instance one of the values in Table [T] with the improved upper bound oj < 2.3727 by Vassilevska 
Williams. Since the improvement is small and concerns only the case k 1, we will not discuss it 
further. 

For k > 0.29462 and k ^ 1, the complexity of our algorithms is better than all known algorithms 
for rectangular matrix multiplication, including the algorithms iTTBl IT6l mentioned above. Moreover, for 
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Table 1: Our upper bounds on the exponent of the multiplication of an n x n k matrix by an n k x n 
matrix. 
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Figure 1: Our upper bounds (in plain line) on w(l, 1, k), for < k < 1. The dashed line represents 
the upper bounds on w(l, 1, k) obtained by using Equation £T|) with the values a > 0.30298 and uj < 
2.375477. 




0.30298 < k < 1, our new bounds are significantly better than what can be obtained solely from the 
bound a > 0.30298 and uj < 2.375477 through Equation (fl]), as illustrated in FigureQ] This suggests that 
non-negligible improvements can be obtained for all applications of rectangular matrix multiplications 
that rely on this simple linear interpolation — we will elaborate on this subject in Subsection ll.il 

Let us compare more precisely our results with those reported in lfl6l . For k = 2, we obtain 
w(l, 1, 2) < 3.256689 while Ke et al. OH obtained w(l, 1, 2) < 3.2699 by using the basic construc- 
tion F q . Our improvements are of the same order for the other two values (k = 0.8 and k = 0.5356) 
analyzed in |[T6l . as can be seen from Tabled] Note that the order of magnitude of the improvements 
here is similar to what was obtained in ifTOl by changing the basic construction from F q to F q ® F q for 
square matrix multiplication, which led to a improvement from uj < 2.38719 tow < 2.375477. 

A noteworthy point is that our algorithm directly leads to improved lower bounds on a while, as 
already mentioned, to obtain a nontrivial lower bound on a using the basic construction F g (as done 
in [9]) a specific methodology was needed. Our approach can then be considered as a general framework 
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to study rectangular matrix multiplication, which leads to a unique optimization problem that gives upper 
bounds on cj(1, 1, k) for any value of k. 



1.1 Applications 

As mentioned in the beginning of the introduction, improvements on the time complexity of rectan- 
gular matrix multiplication give faster algorithms for a multitude of computational problems. In this 
subsection we describe quantitatively the improvements that our new upper bounds imply for some of 
these problems: sparse square matrix multiplication, the all-pairs shortest paths problem and computing 
dynamically the transitive closure of a graph. 

Sparse square matrix multiplication. Yuster and Zwick 021 have shown how fast algorithms for 
rectangular matrix multiplication can be used to construct fast algorithms for computing the product of 
two sparse square matrices (this result has been generalized to the product of sparse rectangular matrices 
in IPT511 . and the case where the output matrix is also sparse has been studied in [3]). More precisely, let M 
and M' be two re x re matrices such that each matrix has at most m non-zero entries, where < m < n 2 . 
Yuster and Zwick 021 showed that the product of M and M' can be computed in time 



where A m is the solution of the equation A m + 1, A m ) = 21og n (m). Using the upper bounds on 
ui(l, 1, k) of Equation (Q]) with the values a < 0.294 and ui < 2.376, this gives the complexity depicted 
in Figure [2] 

These upper bounds can be of course directly improved by using the new upper bound on u by 
Vassilevska Williams |[27l and the new lower bound on a given in the present work, but the improve- 
ment is small. A more significant improvement can be obtained by using directly the upper bounds on 
w(l, 1, k) presented in Figure [T] which gives the new upper bounds on the complexity of sparse matrix 
multiplication depicted in Figure|2] For example, for m = re 4 / 3 , we obtain complexity 0(re 2 087 ), which 
is better than the original upper bound 0(n 2 ' 1293 "') obtained from Equation dTJ with a > 0.294 and 
lo < 2.376. Note that replacing to < 2.376 with the the best known bound oj < 2.3727 only decreases 
the latter bound to 0(n "'). Thus, even if the algorithms presented in the present paper do not give 
any improvement on ui (i.e., for the product of dense square matrices), we do obtain improvements for 
computing the product of two sparse square matrices. 

Graph algorithms. Zwick 041 has shown how to use rectangular matrix multiplication to compute, 
with high probability, the all-pairs shortest paths in weighted direct graphs where the weights are bounded 
integers. The time complexity obtained is 0(n 2+fl+e ), for any constant e > 0, where p, is the solution of 
the equation lo(1, 1, p) = 1 + 2y. Using the upper bounds on 1, fc) of Equation (Q]) with a > 0.294 
and uj < 2.376, this gives p < 0.575 and thus complexity 0(n 2,575+e ). This reduction to rectangular 
matrix multiplication is the asymptotically fastest known approach for weighted directed graphs with 
small integer weights. 

Our results (see Table [T} show that u(l, 1,0.5302) < 2.0604, which gives the upper bound fi < 
0.5302. We thus obtain the following result. 

Theorem 1.2. There exists a randomized algorithm that computes the shortest paths between all pairs 
of vertices in a weighted directed graph with bounded integer weights in time 0(n 2,5302 ), where n is the 
number of vertices in the graph. 

Note that, even if lo = 2, the complexity of Zwick's algorithm is 0(n 2,5+e ). In this perspective, our 
improvements on the complexity of rectangular matrix multiplication offer a non-negligible speed-up for 
the all-pairs shortest paths problem in this setting. 
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Figure 2: Upper bounds on the exponent for the multiplication two n x n matrices with at most m non- 
zero entries. The horizontal axis represents log n (m). The dashed line represents the results by Yuster 
and Zwick |[32l and shows that the term n £J ( 1 ' 1,Am ) dominates the complexity when 1 < log n (m) < 
(1 +uj)/2. The plain line represents the improvements we obtain. 



The same approach can be used to improve several other existing graph algorithms. Let us describe 
another example: algorithms for computing dynamically the transitive closure of a graph. Demetrescu 
and Italiano lfl2l presented a randomized algorithm for the dynamic transitive closure of directly acyclic 
graph with n vertices that answers queries in 0(n M ) time, and performs updates in 0(n 1+fl+e ) time, for 
any e > 0. Here \x is again the solution of the equation lo(1, 1, /i) = 1 + 2/z. This was the first algorithm 
for this problem with subquadratic time complexity. This result have been generalized later to general 
graphs, with the same bounds, by Sankowski |[2TI . Our new upper bounds thus show the existence of 
an algorithm for the dynamic transitive closure that answers queries in O(n 5302 ) time and performs 
updates in 0(n L5302 ) time. 

1.2 Overview of our techniques and organization of the paper 

Before presenting an overview of the techniques used in this paper, we will give an informal description 
of algebraic complexity theory (the contents of which will be superseded by the formal presentation of 
these notions in Section |2]). In this paper we will use, for any positive integer n, the notation [n] to 
represent the set {1, . . . , n}. 

Trilinear forms and bilinear algorithms. The matrix multiplication of an m x n matrix by an n x p 
matrix can be represented by the following trilinear form, denoted as (m, n, p): 

m n p 

(m, n,p) = ^ ^ ^ XrsVstZrU 
r=l s=l t=l 

where x rs , y s t and z rt are formal variables. This form can be interpreted as follows: the (r, t)-th entry 
of the product of an m x n matrix M by an n x p matrix M' can be obtained by setting Xij = Mij for 
all € [m] x [n] and yij = for all £ [n] x [p], setting z rt = 1 and setting all the other 
z-variables to zero. One can then think of the z-variables as formal variables used to record the entries 
of the matrix product. 
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More generally, a trilinear form t is represented as 



ieA jeB fcec 

where A, B and C are three sets, Xj, yj and are formal variables and the t^k's are coefficients in a 
field F. Note that the set of indexes for the trilinear form (m, n, p) are A = [m] x [n], B = [n] x [p] and 
C = [m] x [p]. 

An exact (bilinear) algorithm computing t corresponds to an equality of the form 

t = Yl ( Y auXi Y feVi Y ^ lkZk 
i=\ VieA / \jeB J Wc / 

with coefficients au,/3ej, ^ik in F. The minimum number r such that such a decomposition exists is 
called the rank of the trilinear form t, and denoted R(t). The rank of a trilinear form is an upper bound 
on the complexity of a (bilinear) algorithm computing the form: it precisely expresses the number of 
multiplications needed for the computation, and it can be shown that the number of additions or scalar 
multiplications affect the cost only in a negligible way. For any k > 0, the quantity oj(1, 1, k) can then 
be equivalently defined as follows: 

w(l,l,ife) = inf{r G R | R((n,n, [n k \)) = 0{n T )}. 

Approximate bilinear algorithms have been introduced to take advantage of the fact that the complex- 
ity of trilinear forms (and especially of matrix multiplication) may be reduced significantly by allowing 
small errors in the computation. Let A be an indeterminate over F, and let F[A] denote the set of all poly- 
nomials over F in A. Let s be any nonnegative integer. A A-approximate algorithm for t is an equality of 
the form 



\ s t + a s+i [ Y Y d ijkXiVjZk ] =Y[Y aaxi Y fowl 

,i£Aj<=Bk£C J l=\ \i£A ) XjeB I \keC 




llkZk 



for coefficients an, fyj^gkidijk i n an ^ some nonnegative integer s. Informally, this means that 
the form t can be computed by determining the coefficient of X s in the right hand side. The minimum 
number r such that such a decomposition exists is called the border rank of t and denoted R(t). It is 
known that the border rank is an upper bound on the complexity of an algorithm that approximates the 
trilinear form, and that any such approximation algorithm can be converted into an exact algorithm with 
essentially the same complexity. 

A sum U °f trilinear forms is a direct sum if the tj's do not share variables. Informally, Schonhage's 
asymptotic sum inequality |[23ll for rectangular matrix multiplication states that, if the form t can be con- 
verted (in the A-approximation sense) into a direct sum of c trilinear forms, each form being isomorphic 
to (m, m, m k ), then 

c . m w(l.l,*) <R(t). 

This suggests that good bounds on cj(1, 1, k) can be obtained if the form t can be used to derive many 
independent (i.e., not sharing any variables) matrix multiplications. This approach has been applied to 
derive almost all new bounds on matrix multiplication since its discovery in 1981 by Schonhage. 



Overview of our techniques. Our algorithm uses, as its basic construction, the trilinear form F q (g> F q 
from [ 10], which can be written as a sum of fifteen terms Tijk, for all fifteen nonnegative integers i, j, k 
such that i + j + k = 4: 

F q ®F q = Y, T v k - 

0<i,J,fc<4 
i+j+k=4 
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If the sum were direct, then, from Schonhage's asymptotic sum inequality and since an upper bound on 
the rank of F q ®F q is easy to obtain, this would reduce the problem to the analysis of each part T^. This 
is unfortunately not the case: the T^'s share variables. To solve this problem, the basic construction is 
manipulated in order to obtain a direct sum, similarly to [10]. The first step is to take the iV-th tensor 
product of the basis construction, where N is a large integer. This gives: 

(F q ®F q )® N =Y j Tijk 

UK 

where the sum is over all triples of sequences IJK with I, J, K £ {0, 1, 2, 3, 4}^ such that Ig_ + + 
Ki = 4 for all I 6 {1, . . . , N}. Here each form Tjjk is the tensor product of N forms T^. The 
sum is nevertheless not yet direct. The key idea of the next step is to zero variables in order to remove 
some forms and obtain a sum where any two non-zero forms Tjjk and Tjijik 1 are such that / ^ I', 
J ^ J' and K ^ K', which will imply that the sum is direct. Moreover, in order to be able to apply 
Schonhage's asymptotic sum inequality, we would like all the remaining Tjjk to be isomorphic to 
the same rectangular matrix product (i.e., there should exist values m and k such that each non-zero 
form Tjjk is isomorphic to the matrix multiplication (m, m, m k )). From here our approach differs from 
Coppersmith and Winograd [10], since our concern is upper bounds for rectangular matrix multiplication. 

We will take fifteen integers dijk and show how the form (F q ®F q )® N can be transformed, by zeroing 
variables, into a direct sum of a large number of forms Tjjk in which each Tjjk is such that 

Tjjk= (g) T^ N . (2) 

0<«,i,fc<4 
i+j+k=4 

The main difference here is that, in [10], the symmetry of square multiplication implied that the a^'s 
could be taken invariant under permutation of indices, with means that only four parameters (aoo4, «oi3> 
ao22 and 0112) needed to be considered. In our case, we still impose the condition = a^j, but not 
more. This reduces the number of parameters to nine: aoo4, a400> aoi3> «103> 0,301, 0022, 0202, «H2 and 
0211. Many nontrivial technical problems arise from this larger number of parameters. In particular, 
the equations that occur during the analysis do not have a unique solution and an optimization step is 
necessary. This is similar to the difficulties that appeared in the analysis of the basis construction F® 4 
(for square matrix multiplication) done in 11241 [27). We will show that this further optimization step 
essentially imposes the additional (nonlinear) constraint 00130202^112 = oi03 a 022O2ii- 

Showing that (F q <g) F q )® N can be transformed into a direct sum of many isomorphic forms, as 
claimed, was also used in previous works based on the approach by Coppersmith and Winograd. Our 
setting is nevertheless more general than in [10|, for the following two reasons. First, our approach 
is asymmetric since our parameters are not invariant under permutations of the indices. Second, 
the technical problems due to the presence of many parameters require more precise arguments. The 
former complication was addressed implicitly in [9 |, and explicitly in |[T3llT6l . The latter complication 
was addressed in 11241 12"71 . Here we need to deal with these two complications simultaneously, which 
involves a careful analysis. Instead of approaching this task directly in the language of trilinear forms, 
we give a graph-theoretic interpretation of it, which will make the exposition more intuitive, and also 
simplify the analysis. Each form Tjjk will correspond to one vertex in a graph, and an edge in the graph 
will represent the fact that two forms share one index. We will interpret the task of converting the sum 
(F q <g) F q )® N into a direct sum (i.e., a sum where the non-zero forms Tjjk do not share any index) as 
the task of converting the graph, by using only simple graph operations, into an edgeless subgraph. We 
will present algorithms solving this latter graph-theoretic task and show that a large edgeless subgraph 
can be obtained, which means that many forms Tjjk not sharing any index can be constructed from 
(F q <g) Fq)® 1 *. 

Now that we have a direct sum of many forms, each form being isomorphic to ©, the only thing to 
do before applying Schonhage's asymptotic sum inequality is to show that the form © is isomorphic to 
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a direct sum of matrix products (m, m, m k ). Some of the Ty&'s (more precisely, all the T^'s except 
?ii2, T\2\ and T211) can be analyzed in a straightforward way, since they correspond to matrix products, 
as originally observed in ifTOl . The forms T\\2, Tm and T211 are delicate to analyze since they do not 
correspond to matrix products. In liTOl (see also 11241 1271 ). they were analyzed individually through the 
concept of "value", a quantity that evaluates the number of square matrix products (and their size) that 
can be created from the form under consideration. This is nevertheless useless for estimating cj(l,l,k), 
except when k = 1, because the value is intrinsically symmetric (in particular, the values of T112, Im 
and T211 are identical), while here we are precisely interested in breaking the symmetry in order to obtain 
bounds for rectangular matrix multiplication. Instead, we will analyze the term 

globally. This is the key new idea leading to our new bounds on a and more generally on cj(1, 1, k) for 
any k ^ 1. Note that this difficulty was not present in previous works on rectangular matrix multipli- 
cation (9j 221 US]] : for simpler basic constructions such as F q , all the smaller parts correspond to matrix 
products. We will show that T112, T121 and T211 can be converted into a large number of objects called 
"^-tensors" in Strassen's terminology E6l . This will be done by relying on the graph-interpretation we 
introduced and showing how this conversion can be interpreted as finding large cliques in a graph (this is 
the main reason why we developed this graph-theoretic interpretation). While the fact that the form T112 
corresponds to a sum of ^-tensors was briefly mentioned in iflOl . and a proof sketched, we will need a 
complete analysis here. We will in particular rely on the fact that the ^-tensors obtained from X112, Im 
and T211 are not identical. The success of our approach comes from the discovery that these 'rf -tensors 
are actually "complementary": while the ^-tensors obtained individually from Tm, Tm and T211 do 
not give any improvement for the exponent of rectangular matrix multiplication, their combination (i.e., 
the ^-tensors corresponding to the whole term ©) does lead to improvements when analyzed globally 
by the "laser method" developed by Strassen ll26l . This will show that the form ©, and thus the form 
(O as well, is isomorphic to a direct sum of matrix products (to, to, m k ). 

Finally, Schonhage's asymptotic sum inequality will give an inequality, depending on the parame- 
ters dijk, that involves co(l, 1, k). Our new upper bounds on uj(1, 1, k) are obtained by optimizing these 
parameters. While this is done essentially though numerical calculations, the new lower bound on a 
requires a more careful analysis where the optimal values of all but a few of the parameters are found 
analytically. 

Higher powers. A natural question is whether our bounds can be improved by taking higher tensor 
powers of F q as the basic algorithm, i.e., taking F® r for r > 2. As can be expected from the analysis for 
r = 2 we have outlined above, the analysis is much more difficult than in the square case, for two main 
reasons. The first reason is that the construction is not symmetric and thus more parameters have to be 
considered. This problem can be nevertheless addressed through a systematic framework similar to the 
one described in E71 — this is actually quite accessible without using a computer for r = 4. The second, 
and more fundamental, reason is that the analysis of the smaller parts is different from the square case 
since it does not use the concept of value. We solved this problem for r = 2 by applying Strassen's laser 
method globally on the combination of T112, T121 and T211, which is the key technical contribution of 
this paper. For larger values of r the same approach can in principle be used, but other techniques seem 
to be necessary to convert these ideas into a systematic framework. 

Organization of the paper. Section [2] describes formally the notions of trilinear forms and Strassen's 
laser method. Section [3] presents in details the basic construction F q (g) F q from iTTOll and some of its 
properties. Section |4]describes the graph-theoretic problems that will arise in the analysis of our trilinear 
forms. Section [5] describes our algorithm, and Section [6] shows how to use this algorithm to derive the 
optimization problem giving our new upper bounds on ui(l,l,k). Finally, the optimization is done in 
Section [7] 
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2 Preliminaries 



In this section we present known results about trilinear forms, Strassen's laser method and Salem-Spencer 
sets that we will use in this paper. We refer to for an extensive treatment of the first two topics. 

2.1 Trilinear forms, degeneration of tensors and Schonhage's asymptotic sum inequality 

It will be convenient for us to use an abstract approach and represent trilinear forms as tensors. Our 
presentation is independent from what was informally defined and stated in Subsection ll.2[ but describes 
essentially the same contents. We thus encourage the reader who encounter these notions for the first 
time to refer at Subsection ll.2l for concrete illustrations. 
We assume that F is an arbitrary field. 

Let U = F", V = ¥ v and W = ¥ w be three vector spaces over F, where u, v and w are three positive 
integers. A tensor t of format (u, v, w) is an element of U <g) V ® W = ¥ uxvxw , where ® denotes the 
tensor product. If we fix bases {xi}, {yj} and {zk} of U, V and W, respectively, then we can express t 
as 

t = t ijk Xi <g> yj <g> z k 

ijk 

for coefficients in F. The tensor t can then be represented by the 3-dimensional array [iyfc]. We will 
often write xi ® yj <X> Zj simply as XiyjZ^. 

The tensor corresponding to the matrix multiplication of an m x n matrix by an n x p matrix is the 
tensor of format (m x n,n x p,m x p) with coefficients 

f 1 if i = (r, s),j = (s, t) and k = (r, t) for some integers (r, s, t) € [m] x [n] x [p] 
ljk ~ \ otherwise . 

This tensor will be denoted by (m, n,p). 

Another important example is the tensor Yle=i x iUi z e of format (n, n, n). This tensor is denoted (n) 
and corresponds to n independent scalar products. 

Let A be an indeterminate over F and consider the extension F[A] of F, i.e., the set of all polynomials 
over F in A. A triple of matrices a G ¥[X] U ' XU , G ¥[\} v ' xv and 7 G ¥[\} w ' xw transforms t G ¥ uxvxw 
into the tensor (a (g> /? <S> j)t G F[A]"' xt '' xu '' defined as 

(a (8) P ® i)t = ^kjk a{xi) ® (3(yj)®j(zj). 

ijk 

This new tensor is called a restriction of t. Intuitively, the fact that a tensor t' is a restriction of t means 
that an algorithm computing t can be converted into an algorithm computing t' that uses the same amount 
of multiplications (i.e., an algorithm with essentially the same complexity). We now give the definition 
of degeneration of tensors. 

Definition 2.1. Let t G ¥ uxvxw and t' G Y u ' xv ' xw ' be two tensors. We say that t' is a degeneration oft, 
denoted t' < t, if there exists matrices a e¥[X] u ' xu , j3 e¥[X] v ' xv and 7 G F[A] W ' X '" such that 

AY + X s+1 t" = (a ® p ® i)t 

for some tensor t" G ¥[X] U xv xw and some nonnegative integer s. 

Intuitively, the fact that a tensor t' is a degeneration of a tensor t means that an algorithm computing t 
can be converted into an "approximate algorithm" computing t' with essentially the same complexity. 
The notion of degeneration can be used to define the notion of border rank. 

Definition 2.2. Let t be a tensor. Then R(t) = min{r G N 1 1 < (r)}. 
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Let t £ U ®V ®W and t' € U' © V © W be two tensors. We can naturally define the direct sum 
t © t', which is a tensor in (U © U') © (V © V') © (W © W'), and the tensor product i © t', which 
is a tensor in (U © C/ 7 ) (8) (V © V) © (W © W). For any integer c > 1, we will denote the tensor 
t © • • • © t (with c occurrences of t) by c • t and the tensor i © • • • © t (with c occurrences of t) by 
The degeneration of tensors has the following properties. 

Proposition 2.1 (Proposition 15.25 in [5]). Let ti, t[,t 2 and t 2 be four tensors. Suppose that t' x < t± and 
t' 2 < t 2 . Then t[ © t' 2 <ti@t 2 and t[ © t' 2 < ii © i 2 . 

Schonhage's asymptotic sum inequality E3l will be one of the main tools used to prove our bounds. 
Its original statement is for estimating the exponent of square matrix multiplication, but it can be eas- 
ily generalized to estimate the exponent of rectangular matrix multiplication as well. We will use the 
following form, which has been also used implicitly in |[T3l[T6Tl . A proof can be found in ifTTl . 

Theorem 2.1 (Schonhage's asymptotic sum inequality). Let k, m and c be three positive integers. Let t 
be a tensor such that c ■ (m, m, m k ) < t. Then 

c . m w(l.i,*) <R(t). 

Theorem 12. 1 1 states that, if the form t can be degenerated (i.e., approximately converted, in the sense 
of Definition 12. II) into a direct sum of c forms, each being isomorphic to (m, m, m k ), then the inequality 
c • m w<yl,l,k ' > < R(t) holds. Note that this is a powerful technique, since the concepts of degeneration 
and border rank refer to "approximate algorithms", while 1, k) refers to the complexity of exact 
algorithms for rectangular matrix multiplication. 

2.2 Strassen's laser method and ^-tensors 

Strassen ll25l introduced in 1986 a new approach, often referred as the laser method, to derive upper 
bounds on the exponent of matrix multiplication. To the best of our knowledge, all the applications of 
this method have focused so far on square matrix multiplication, in which case several simplifications can 
be done due to the symmetry of the problem. In this paper we will nevertheless need the full power of the 
laser method, and in particular the notion of ^-tensor introduced in Il25ll26l . to derive our new bounds 
on the exponent of rectangular matrix multiplication. The exposition below will mainly follow Q. 

Let t£U®V(&Wbe,SL tensor. Suppose that U, V and W decompose as direct sums of subspaces 
as follows: 

U=@U U V=®Yj, w= w k . 
ieSu jes v keS w 

Denote by D this decomposition. We say that t is a "rf -tensor with respect to D if t can be written as 

(i,j,k)eSu xSyX S w 

where each is a tensor in Ui ® Vj © Wk- The support of t is defined as 

supp D (t) = {(i,j,k) £ Su x S v x Sw I U jk ^ 0}, 

and the nonzero ty^'s are called the components of t. We will usually omit the reference to D when 
there is no ambiguity or when the decomposition does not matter. 

As a simple example, consider the complete decompositions of the spaces U = F mxn , V = ¥ nxp 
and W = F mxp (i.e., their decomposition as direct sums of one-dimensional subspaces, each subspace 
being spanned by one element of their basis). With respect to this decomposition, the tensor of matrix 
multiplication (m, n,p) is a ^-tensor with support 

supp c ((m,n,p)) = {((r,s),(s,t),(r,t)) \ (r,s,t) £ [m] x [n] x [p]} 
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where each component is trivial (i.e., isomorphic to (1, 1, 1)). In this paper the notation supp c ((m, n,p}) 
will always refer to the support of (m, n,p) with respect to this complete decomposition. 

We now introduce the concept of combinatorial degeneration. A subset A of Sjj x Sy x S\y is 
called diagonal if the three projections A — > Sjj, A — > Sy and A — > Sw are injective. Let $ be a 
subset of Su x Sy x Sw- A set ^ C $ is a combinatorial degeneration of <I> if there exists tree functions 
a: Sjj — > Z, b: Sy — > Z and c : S^y — > Z such that 

• for all (i, j, fc) G a(i) + 6(j) + c(fc) = 0; 

• for all k) G $\*, a(i) + 6(j) + c(k) > 0. 

The most useful application of combinatorial degeneration will be the following result, which states 
that a sum, over indices in a diagonal combinatorial degeneration of supp D (i), of the components t^k is 
direct. 

Proposition 2.2 (Proposition 15.30 in Q). Let t be ft '-tensor with support supp D (t) and compo- 
nents tijj,. Let A C supp£>(t) be a combinatorial degeneration of supp D (t) and assume that A is 
diagonal. Then 

tijk — t- 

(i,j,k)eA 

In this work we will construct ^-tensors where all the components are isomorphic to (m, m, m k ) for 
some values m and k. Proposition 12.21 then suggests that a good bound on the exponent of rectangular 
matrix multiplication can be derived, via Theorem 12.11 if the support of the ^"-tensor contains a large 
diagonal combinatorial degeneration. When this support is isomorphic to supp c ((e, h, £)) for some 
positive integers e, h and I, a powerful tool to construct large diagonal combinatorial degenerations 
is given by the following result by Strassen (Theorem 6.6 in ll26l ). restated in our terminology. 

Proposition 2.3 ([26]). Let e\,e2 and e% be three positive integers such that e± < e2 < e^. For any 
permutation a of{l, 2, 3}, there exists a diagonal set A C supp c ((e (T n), e CT (2), e<j(3))) with 



IAI 




(e 1 +e 2 -e 3 ) 2 
4 



ife\ + e 2 > e 3 
otherwise 



that is a combinatorial degeneration of supp c ((e a (ij,e a (2)i e a(3)))- In particular, |A| > [~3eie2/4~|. 

Finally, we mention that the concept of 'rf -tensor is preserved by the tensor product. We will just 
state this property for the restricted class of ^-tensors that we will encounter in this paper (for which the 
precise decompositions do not matter), and refer to [|5] or to Section 7 in ESI for a complete treatment. 

Proposition 2.4. Let t be a ft -tensor with support isomorphic to supp c ((e, h, tj) in which each com- 
ponent is isomorphic to (m, n, p). Let t' be a ft -tensor with support isomorphic to supp c ((e', h' ,£')) in 
which each component is isomorphic to (m 1 , n',p'). Then t <g> i' is a ft -tensor with support isomorphic 
to supp c ((ee', hh! , ££')) in which each component is isomorphic to (mm' , nn' ,pp'). 



2.3 Salem-Spencer sets 

Let M be a positive integer and consider Zm = {0, 1, . . . , M — 1}. We say that a set B C Zm has no 
length-3 arithmetic progression if, for any three elements b\, hi and 63 in B, 

b\ + b2 = 263 mod M <^=^ b\ = 62 = ^3- 
Salem and Spenser have shown the existence of very dense sets with no length-3 arithmetic progression. 
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Theorem 2.2 ([20]). For any e > 0, there exists an integer M e such that, for any integer M > M e , there 
is a set B C Zm of size \B\ > M 1_<E wzY/j no length-3 arithmetic progression. 

We refer to these sets as Salem-Spencer sets. They can be constructed in time polynomial in M. 
Note that this construction has been improved by Behrend [4|, but the above statement will be enough 
for our purpose. 

3 Coppersmith- Winograd's construction 

In this section we describe the construction by Coppersmith and Winograd [10], which we will use as 
the basis of our algorithm, and several of its properties. 

We start with the simpler construction presented in Section 7 of ifTUl . For any positive integer q, let 
us define the following trilinear form F q , where A is an indeterminate over F. 

q 

F q ='^2 ^~ 2 ( x o + Axj)(y + Ayj)(zo + Xzi) - 
i=i 

q q q 

\- 3 (x + a 2 xi)(vo + x2 Yl yi ^ z ° + ^ 2 Y z i) + 

i=i i=i i=i 

{\- 3 - q\- 2 ){x + X 3 x q+1 )(y + X 3 y q+1 )(z + X 3 z q+1 ) 

In this trilinear form the x- variables are xq, x\, . . . , x q+ \. Similarly, the number of y- variables is (q + 2) 
and the number of z-variables is (q + 2) as well. Define the form 

q 

Fq = ^ixoViZi + Xiy Zi + x^Zo) + x y o z q+1 + x y q+ iz + x q+ iy zo- 
i=i 

It is easy to check that the form F q can be written as F q = F' q + A • F q , where F q is a polynomial in A and 
in the x-variables, y-variables and z-variables. In the language of Section[2l this means that F' < F q and, 
informally, this means that an algorithm computing F q can be converted into an algorithm computing F' q 
with the same complexity. Note that R(F q ) < q + 2 since, by definition, F q is the sum of q + 2 products. 

A more complex construction is proposed in Section 8 of [10|. It is obtained by taking the tensor 
product of F q by itself. By Proposition 12. 1 1 we know that F' q ® F' q < F q ® F q . Consider the tensor product 
of F' q by itself: 

F' q ® F' q = ?004 + ?040 + T400 + ?oi3 + T031 + T103 + T130 + T301 + T310 + 
Tq22 + T202 + T220 + 7^112 + T12I + T211 

where 

-*004 — x O,oyo,O z q+l,q+l 

q q 

Toi3 = X 0,0y},0 Z i,q+l + X 0,0y0,k z q+l,k 

i=l k=l 

q 

rp _ 2 2,02 2, 2 2 

J022 — ^0,0^9+1,0^0,9+1 + x 0fiy0,q+l z q+l,0 + / ,, ^0,0 j/j.fc^i.fc 

i,k=l 

q q q q 

Fll2 = ^2 X ifiyi,0 Z 0,q+l + ^2 X 0,ky0,k z q+1,0 + X i,o2/0,fc Z «\fc + ^ x 0,kyi,0 z i,k 

i=l k=l i,k=l i,k=l 

and the other eleven terms are obtained by permuting the indexes of the x-variables, the y-variables 
and z-variables in the above expressions (e.g., T 04 o = ^0,02/9+1,9+1^0,0 and T 400 = ^9+i,9+i2/o,o z o,o)- 
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Let us describe in more details the notations used here. The number of x-variables is (q + 2) 2 . They 
are indexed as x^, for i, k S {0, 1, . . . , q + 1}. The superscript is assigned in the following way: 
the variable xo,o has superscript 0, the variables in {a^o, a?o,fc}l<i,j<g have superscript 1, the variables 
in {xg + i i0 ,Xi ifc ,^o,g+i}i<i,j<g have superscript 2, the variables in {xg + i,/b^i,g+l}l<i,j<q have super- 
script 3 and the variable x q+ \ _ q+ i has superscript 4. Note that the superscript is completely determined 
by the subscript. Similarly, the number of y-variables is (q + 2) 2 , and the number of z-variables is 
(q + 2) 2 as well. The y-variables and the z-variables are assigned subscripts and superscripts exactly as 
for the x-variables. Observe that any term xyz that appears in is such that x has superscript i, y has 
superscript j and z has superscript k. 
We thus obtain 

T Hk <F q ®F q . 

0<i,j,k<4 
i+j+k=A 

Moreover we know that R(F q <g) F q ) < (q + 2) 2 , since F q ® F q can be written using (q + 2) 2 multipli- 
cations. 

We will later need to analyze all the forms T^. It happens, as observed in [10], that most of these 
forms (all the forms except Im, Im and T^ii) can be analyzed in a straightforward way, since they are 
isomorphic to the following matrix products: 



rsj rri 

= -t040 


= -t400 = 


(1,1,1) 


Ton 


= J 031 = 


(1,1, 2q) 




= J- 301 = 


(2g,l,l) 


^130 


r^j rji r^i 

= -1310 = 


(l,2g,l) 




rji r^j 

-1022 = 


(l,l,? 2 + 2) 




^202 — 


(g 2 + 2,l,l) 




-1-220 = 


(1,Q 2 +2,1) 



This can be seen from the definition of the trilinear form (or the tensor) corresponding to matrix multi- 
plication described in Section [2] For example, the form T013 is isomorphic to the tensor Yl^Li x oV£ z £ = 
(1,1, 2q), which represents the product of a 1 x 1 matrix (a scalar) by a 1 x 2q matrix (a row). 

4 Graph-Theoretic Problems 

In this section we describe and solve several graph-theoretic problems that will arise in the analysis of 
our trilinear forms. While the presentation given here is independent from the remaining of the paper, 
the reader may prefer to read Section|5]before going through this section. 

4.1 Problem setting 

Let r be a fixed positive integer. Let N be a large integer and define the set 

A = {(/, J, K) G [t] n x [ T ] N x [t] n \I e + J e + K e = T for all £ G {1, ... , N}} . 

Define the three coordinate functions /1, /2, f% : [r] x [r] x [t] n — > [t] as follows. 

fi((I,J,K)) = I 
f 2 ((I,J,K)) = J 
h((I,J,K)) = K 

From the definition of A, two distinct elements in A cannot agree on more than one coordinate. Since 
this simple observation will be crucial in our analysis, we state it explicitly as follows. 
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Fact 1. Let u and v be two elements in A. If fi(u) = fi(v) for more than one index i G {1, 2, 3}, then 
u = v. 



Let U be a subset of A such that there exist integers A/"i , M 2 and A3 for which the following property 
holds: for any / G [t] , 

\{ueU\f 1 (u) = I}\ g {0,M} 
|{«€Z7|/ a («) = J}| G {0,AA 2 } 
|{ U GC/|/ 3 (n)=/}| G {0,AA 3 }. 

This means that, for any i G {1,2,3} and any / G [r] , the size of the set fi(I)^ 1 is either or A/i. Let 
us write |/i(*7)| = Ti, |/ 2 (*7)| = T 2 and |/ 3 (t/)| = T 3 . It is easy to see that 

\U\=T 1 M 1 = T 2 M 2 = T 3 M 3 . 

We are interested in stating asymptotic results holding when N goes to infinity. Through this section the 
A/i's and the Tj's will be strictly increasing functions of N. 

Let G be the (simple and undirected) graph with vertex set U in which two distinct vertices u and v 
are connected if and only there exists one index i G {1, 2, 3} such that fi{u) = fi(v). The goal will be to 
modify the graph G to obtain a subgraph satisfying some specific properties. The only modification we 
allow is to remove all the vertices with a given sequence at a given position: given a sequence / G [r] 
and a position s G {1,2,3}, remove all the vertices u (if any) such that fi(u) = I. We call such an 
operation a removal operation. The reason why only such removal operations are allowed is that they 
will correspond, when considering trilinear forms, to setting to zero some variables, which is one of the 
only operations that can be performed on trilinear forms. 

While not stated explicitly in graph-theoretic terms, the key technical result by Coppersmith and 
Winograd [10 ] is a method to convert, when M\ = M 2 = A/3, the graph G into an edgeless graph that 
still contains a non-negligible fraction of the vertices. We state this result in the following theorem. 

Theorem 4.1 ([10]). Suppose that N\ = N 2 = A/" 3 . Then, for any constant e > 0, the graph G can be 
converted, with only removal operations, into an edgeless graph with Q (^j)?^ vertices. 



In this section we will give several generalizations of this result. In Subsection 14.21 we state our 
generalizations. Then, in Subsections I4.3H4.6I we describe the algorithms and prove the results. We 
stress that, in this section as in [10], the number of removal operations (i.e., the time complexity of the 
algorithms) is irrelevant. This is because, in the applications of these results to matrix multiplication, the 
parameter N will be treated as a large constant independent of the size of the matrices considered. 



4.2 Statement of our results 

The generalization we consider assume the existence of a known set U* C U such that 

• \fi(U*)\ = Ti for each i G {1,2,3}; 

• there exist integers A/"* , N 2 and A/J such that 

\{ueU\ fi(u) = i}\ = Mi \{u e U* I fi{u) = i}\ = N* 

for each / G [t] n and each i G {1, 2, 3}. 

Note that we have necessarily TiA^* = T X M 2 = TiM 2 . 

The first problem considered is again to convert the graph G into an edgeless subgraph that contains 
a non-negligible fraction of the vertices using only removal operations, but we additionally require that 
all the remaining vertices are in U* . Our first result is the following theorem. 
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Theorem 4.2. For any constant e > 0, the graph G can be converted, with only removal operations, into 
an edgeless graph with 



vertices, all of them being in U*. 

Theorem 14. 1 1 is a special case of Theorem 14.21 for U* = U. The case N\ = A/2 = A/3 has been 
proved implicitly by Stothers [24 \ and Vassilevska Williams E71 . 

Our second problem deals with another kind of conversion. Remember that, from the definition of 
the graph G and FactQ] for any edge connecting two vertices u and v in G there exists exactly one index 
i G {1, 2, 3} such that fi(u) = fi{v). Let us define the concept of 1-clique as follows. 

Definition 4.1. A 1-clique in the graph G is a set U' C U for which there exists a sequence I G [t ] 
such that fi(u) = I for all u G V. The size of the 1-clique is \U'\. 

Our second result shows how to convert, using only removal operations, the graph G into a graph 
with vertices in U* that is the disjoint union of many large 1-cliques. The formal statement follows. 

Theorem 4.3. Suppose that A/i > A/2 > A3. Assume that "^T 1 + ^ < j^i- Then, for any constant 
e > 0, the graph G can be converted, with only removal operations, into a graph satisfying the following 
conditions: 

• all the vertices of the graph are in U*; 

• each connected component is a 1-clique (i.e., the graph is a disjoint union of 1-cliques ); 

• among these 1-cliques, there are ft (jl^j 1-cliques that have size ft 

Note that, when only removal operations are allowed, the graph obtained is necessary a subgraph 
of G induced be a subset of its vertices. Theorem 14.31 thus states that there exist 1-cliques U r C U* 
such that the graph obtained after the removal operations is the subgraph of G induced by U r U r , at 
least Q (Ti/A/|) of these U r 's have size Q (A/J /A/2), and for any r / r' there is no edge with one 
extremity in U r and the other extremity in U r i. We mention that the constant 1/1024 in the assumption 
of Theorem 14.31 is chosen only for concreteness. The same theorem actually holds even for weaker 
conditions on N*,T\ and A/2, but this simpler version will be sufficient for our purpose. 

4.3 Choice of the weight functions 

Let M be a large prime number that will be chosen later. We take a Salem-Spencer set V with |T| > 
M l ~ e , which existence is guaranteed by Theorem 12.21 Similarly to [10], we take N + 2 integers 
ujq, ui\,. . ., ujn+i uniformly at random in Zm = {0, 1, . . . , M — 1} and define three hash functions 

bi, b%, 63 : [t] n — > 1*m as follows. 




N 

ujo + IjWj mod M 



b2(I) 



N 

^N+i + ^j 03 ! m °d M 

3=1 
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A property of these functions is that, for a fixed sequence I G [r] , the value is uniformly 
distributed in Zm, for each i G {1, 2, 3}. Note that the term loq is not used in ifTOl . but we introduce it to 
obtain a uniform distribution even if / is the all-zero sequence. 

Let us introduce the notion of a compatible vertex. 

Definition 4.2. A vertex u G U is compatible ifbi{fi{u)) G T for each i G {1,2,3}. Let V be the set of 

vertices in U that are compatible, and denote V* = V PI U*. 

We stress that the definition of compatibility (and thus the definitions of V and V* too) depend of the 
choice of F and of the weights Wj. Using the fact that T is a Salem-Spencer set we can give the following 
simple but very useful characterization of compatible vertices. 

Lemma 4.1. Let i G {1, 2, 3} and i' G {1, 2, 3} be any two distinct indexes. Then a vertex u G U is 
compatible if and only ifbi(fi(u)) = = b for some b G T. 

Proof. Let us take a vertex u G U, and write f\ (u) = I, f 2 (u) = J and fa (u) = K . 

Note that, since I e +Je+K e = r for any index I G {1, . . . , N}, the equality bi(I)+b 2 (J)-2b s (K) = 
mod M always holds (i.e., holds for all choices of the weights ujj). 

Suppose that = &i'(/i'(tt)) = b for some b G T, where i and are distinct. For instance, 

suppose that i = 1 and = 2. Then, since M is a prime, the above property implies that 63 (K) = b, 
which means that u is compatible. The same conclusion is true for the other choices of i and i' . 

Now suppose that u is compatible. From the definition of a Spencer-Salem set, we can conclude that 
there exists an element b G T such that 61 (I) = 62 (J) = b^(K) = b. □ 



4.4 The first pruning 

Similarly to iTTOl . the first pruning simply eliminates all the nodes u G U that are not compatible. Note 
that this can be done using removal operations: for each i G {1, 2, 3}, we remove all the vertices w such 
that fi(w) ^ 6~ 1 (T). The vertices remaining are precisely those in V. Among those remaining vertices, 
the vertices in U* are precisely those in V* = V fl U* . 

We now evaluate the expectation of | V* \ . The proof is very similar to what was shown in [10] for the 
case U = U*, and to what was shown in l2l|23 for M* = = A/3 . 

Lemma 4.2. E[|V*|] = ^§P. 

Proof. We use Lemma l4~T1 with i = 1 and i' = 2. Remember that | J7* | = T\M* . For each vertex u G U* 
and each value b G Z M , the probability that bi(f 1 (u)) = b 2 (f2(u)) = b is 1/M 2 . Note that the two 
events = b and 62 (/2(^)) = b are independent even when fi(u) = f2(u) due to the terms ojq 

and wtv+i in the hash functions. □ 

Let E be the edge set of the subgraph of G induced by V: it consists of all edges connecting two 
distinct vertices u and v in V such that fi(u) = fi(v) for some i S {1,2, 3}. Let E' C E be the subset of 
edges in E with (at least) one extremity in V* . Let E" C be the subset of edges in E' connecting two 
vertices u and v such that fi{u) 7^ /i(i>) (which means that either /^(it) = /2(w) or /s(u) = fs(y)). The 
following lemma gives upper bounds on the expectations of | E' \ and \E"\. The proof can be considered 
as a generalization of similar statements in |[T0]l24]|27l . 

Lemma 4.3. E[\E'\] < ^(M + ^3~3)|r| ^ < ^(^y 3 -2)|r| 

Proof. We show that, for any index i G {1,2,3}, the expected number of ordered pairs (u,v) with 
u G V* and w G ^\{^} such that fi{u) = fi{v) is exactly 

Ti^?(M-i)|r| 

M 3 
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The expectation of the number of edges (i.e., unordered pairs) is necessarily smaller. 

The factor T\M* counts the number of vertices in U*. For any vertex u G U* , there are exactly 
Mi — 1 vertices v G U\{u} such that fi(v) = fi(u). 

Let u be a vertex in U* and v be a vertex in U\{u} such that fi(v) = fi(u). Take another index 
i' G {1, 2, 3}\{i} arbitrarily. From Lemma |4~T1 both u and v are in V if and only if 

Ufi(u)) = b v (f v (u)) = b v (f i ,(y)) = b 

for some element b G T. This happens with probability |T|/M 3 since the random variables bi(fi(u)), 
bi'{fi'{u)), and bi'(fi>(v )) are mutually independent. From the linearity of the expectation, the expected 
number of ordered pairs is TiJV* (Mi — 1) times this probability. □ 



Lemmas 14. 21 and 1431 focused on expected values and were proven, essentially, by the linearity of the 
expectation. This was sufficient for the applications to square matrix multiplication presented in iTTOl l24l 
1271 . and this will be sufficient for proving Theorem 14.21 as well. Since, in order to prove Theorem 14.31 
we will need a more precise analysis of the behavior of the random variables considered, we now prove 
the following lemma, which gives a lower bound on the probability that the subgraph induced by V* has 
many large 1-cliques. 

Lemma 4.4. With probability (on the choice of the weights uii) at least 1 — ^j^ - T 1 — t^\t\ ' tnere exists a 
set R C [r] satisfying the following two conditions: 



\R\ > 



Ti\V\ . 
2M ' 



• for any I G R, there exist at least ^jj vertices u G V* such that fi(u) = I. 
Proof. Let us consider the set fi(U*) = {fi(u) \ u G U*}. For each element / G fi(U*), define the set 

Si = {u eU* \ fi(u) = I}. 

Note that |/i(E7*)| = T lt and |5/| = M* for each / G fi(U*). 
Fix an element / G fi(U*) and define 

X I = \{ueS I \b 2 (f 2 (u)) = b 1 (I)}\. 

This random variable represents the number of vertices from Si that are mapped by b 2 o / 2 into b±(I). 
For any vertex u = (I, J, K) G Si, we have the equivalence 



Wafa)) = h(I) 



N 

E 

i=l 



(J% — h)ui = ujq — wjv+i- 



Thus the probability of the event b 2 (f2(u)) = h(I) is 1/M. Thus E[X/] = Note that, for any 
two distinct elements u and v in Si, the two events b 2 (f 2 (u)) 
independent. From this pairwise independence, var[X/] = -^(1 — ^g). This gives 



6i(J) and b 2 (f 2 (v)) = h(I) are 



Pr 



Xl ~lF 



> 



Ml 

2M 



< Pr 



< 



AM 
M*' 



where the second inequality is obtained by Chebyshev's inequality. 
By the union bound we can conclude that 



Pr 



\ x l\ > —77 for all / G h(U*) 
1 2M 



> 1 



4MTi 
Ml 
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1. L^0;F <-V\ 

2. While (F n V*)\L is not empty do: 

2.1 Pick a vertex u in (F n 

2.2 If there is no vertex v € V connected with u then add u to L; 
Else pick one vertex v £ V connected with u and do: 

- If fl(v) = h(u), then V^V\{ W £V\f 2 (w) = f 2 (u)}; 

- If / 2 («) = / 2 («), then F <- F\{ W g F | / 3 H = / 3 («)}; 

- If hiv) = f 3 (u), then V <- V\{w G F | f 2 (w) = f 2 (u)}; 

3. While V\L is not empty do: 

Take a vertex v G V\L and set F <- V\{w G F | f 2 (w) = f 2 (v)}; 

Figure 3: The second pruning (first version) 



Define R = {I G fi(U*) \ b x (I) G F}. For each element I G fi(U*), the variable 6i(I) is 
distributed uniformly at random in Zjv/. Moreover, the variables 6i(/)'s are pairwise independent. Thus, 
by Chebyshev's inequality, we obtain 



Pr 



l«l> 



Tj|r| 

2M 



> 1 



AM 

Tiin 



From the union bound we can conclude that, with probability at least 1 — 4J ffiF l — ^ffy> the inequality 
\R\ > Ji|r|/(2M) holds and simultaneously, for each element I G R, there exist at least J\f*/(2M) 
vertices u G U* such that fi(u) = I and b 2 (f 2 (u)) = b\{I). These vertices are in V* by Lemma l4Tl □ 



AM 



4.5 The second pruning: first version and proof of Theorem 14.21 

In this subsection M is an arbitrary prime number such that 2{N\ +M 2 +W 3 ) < M < A(J\f± +N 2 +7V 3 ). 

The first pruning has transformed the graph G into the subgraph induced by V. The second pruning, 
similarly to iPTOl . will further modify this subgraph by removing vertices in order to obtain a subgraph 
consisting of isolated vertices from V* (i.e., an edgeless graph). This is done by constructing greedily 
a set L C V* of isolated vertices. Initially L = and, at each iteration, either one remaining vertex in 
V*\L will be added to L or several vertices in V will be removed. This will be repeated until there is 
no remaining vertex in V*\L. Finally, all the remaining vertices not in L will be removed. The detailed 
procedure is described in Figure [3] where V represents the set of remaining vertices (initially V = V). 
Note that the procedure slightly differs from what was done in ifTUl 1241 1271 since we need to take in 
consideration the asymmetry of the problem. 

Let Vt denote the contents of V at the end of the procedure. The following proposition, shown using 
the same ideas as in ifTOl . shows that what we obtain is a large set of isolated vertices from V*. 

Proposition 4.1. The subgraph of G induced by Vf is an edgeless graph. Moreover, Vj Q V* and the 

expectation of\Vf \ (over the choices ofuij) is 

n( Ml ^ 

\(M +w 2 +w 3 ) 1+ v ' 
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Proof. Let Lf denote the contents of L at the end of the procedure. First observe that Vf C Lf, due to 
Step 3. Note that any vertex added to L cannot be later removed from V, since it has no neighbor. Thus 
Lf C Vf, and we conclude that Vf = Lf. This shows in particular that Vf C V* . Moreover, since each 
vertex in L has no neighbor, the subgraph induced by Vf is edgeless. 

To prove the second part, we will show an upper bound on the number of vertices in V* removed 
from V during the loop of Step 2. The bound will be obtained by considering the number of edges from 
E' remaining in the subgraph induced by V. 

Let us consider what happens during Step 2.2. Let u G (V (~)V*)\L be the vertex currently examined. 
Suppose that another vertex in V sharing one index with u is found. For example, suppose that we find 
another vertex v G V with fi(y) = fi(u). Let S = {w G V n V* \ f2(w) = f2(u)} be the set of 
vertices in V* eliminated by the consequent removal operation. Observe that this removal operation 
will eliminate at least ) + 1 new edges from E': the edges between two vertices in S, and the edge 
connecting u and v. Since f'g ) + 1 > \S\, the number of vertices in V* removed during one execution 
of Step 2.2 is at most the number of edges eliminated from E' . 

The total number of vertices from V* that are removed by the procedure during the loop of Step 2 is 
thus at most \E'\, which means that \V~ f C\ V*\ > \V*\ - \E'\. Since V C V* , Lemmas l4~2l and 1431 imply 
that the expected number of vertices in Vf is at least 

Ti.A/?|r| / _ (Mj + M 2 + M 3 - 3) \ JffijTj 
M 2 \ M J ~ 2M 2 ' 

where the inequality follows from the choice of M. Since M = 0(Wi + A/2 + A3) and |T| > M 1_e , 
we conclude that E[\Vf\) = n ( (M+ ^+V 3) i + . ) ■ □ 

Theorem 14.21 follows from Proposition 14. 1 1 by fixing a choice of the weights loq,uji, . . . ,ujn+i for 
which \ Vf\ > E[| VA] (such a choice necessarily exists from the definition of the expectation). 

4.6 The second pruning: second version and proof of Theorem l43l 

In this subsection M is an arbitrary prime such that 64A/2 < M < I2&A/2. 

The first version of the pruning described in Subsection 14.51 was designed to obtain an edgeless 
subgraph of G. In this subsection we describe how to modify it to obtain a union of many large 1- 
cliques instead. The detailed procedure of the new pruning algorithm is described in Figure |4] The only 
difference is that, at Step 2.2, a vertex is not added in L only if it is connected to another vertex with the 
same second or third index. 

Let Vf denote the contents of V at the end of the procedure. By slightly modifying the arguments 
of the previous subsection, it is easy to see that the resulting graph has only vertices from V* and is a 
disjoint union of 1-cliques (i.e., each connected component is a 1-clique). 

Proposition 4.2. The subgraph of G induced by Vf is a disjoint union of 1-cliques. Moreover, Vf C V* 
and\Vf\>\V*\-\E"\. 

Proof. Let L f denote the contents of L at the end of the procedure. Due to Step 3, we know that Vf C Lf. 
Moreover, any vertex added to L cannot be later removed from V, since it has no neighbor with the same 
second or third index. Thus Lf C Vf, and we conclude that Vf = Lf, which shows in particular that 
Vf C V* . Furthermore, since each vertex in L has no neighbor with the same second or third index, the 
subgraph induced by Vf is a disjoint union of 1-cliques. 

The inequality \Vt\ > \V*\ — \E"\ is obtained as in the proof of Proposition 14.11 but replacing the 
edge set E' by E" . □ 

We now give the proof of Theorem 14.31 by using Lemmas 14.31 and 14.41 to evaluate the size and the 
numbers of 1-cliques in the resulting graph. 
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1. L ^ 0;F ^V; 

2. While (V DV*)\L is not empty do: 

2.1 Pick a vertex um (V t~) V*)\L; 

2.2 If there is no vertex v G V with = for some i G {2,3} then add u to L; 
Else pick one vertex u G V with fi(u) = fi(v) for some % G {2, 3} and do: 

- If h(v) = f 2 (u), thenF <- V\{w G V \ h(w) = f 3 (u)}; 

- If fs(v) = h{u), then V <- V\{w G V \ f 2 (w) = f 2 (u)}; 

3. While V\L is not empty do: 

Take a vertex v G F\L and set V <- V\{w e V \ f 2 (w) = /2(f)}; 

Figure 4: The second pruning (second version) 



Proof of Theorem \473\ Using Lemma 1431 with the value M > 64M 2 and Markov's bound, we conclude 
that 

1 



Pr 



< ^iri 



> 

- 2 



16M 2 

Note that the conditions M < 1287V 2 and + ^ < 1^4 im Ply tnat 

> 1 - 512 — i-i + — 7^7 > 1 - 512 — i-i + — - > 



A/"* Ti|T| \ Af* Ti|r|y V -A/"* T i J 2 ' 

Thus the probability that a set R as in Lemma l4~4l exists and the inequality \E"\ < \ e ^ I2 simulta- 
neously holds is positive. There then exists a choice of the weights ujq, coi, . . . , w^v+i such that this 
happens. Let us take such a choice. 

From Proposition 14.21 we know that at most \E"\ vertices in V* are removed by the second pruning. 
Then there exists a set R' C R of size \R'\ > Ti|r|/(4M) such that, for any r' G R' , there are 
at least AT* /(AM) vertices u with fi(u) = r' remaining after the second pruning. This is because, 
otherwise, from the properties of the set R stated in Lemma l4~4l it would be necessary to remove more 
than TiAf* |r|/(16M 2 ) vertices during the second pruning. □ 

5 Algorithm for Rectangular Matrix Multiplication 

In this section we present our algorithm, which essentially consists in the two algorithmic steps described 
in Subsections I5.2l and l5.4l We first start by explaining the construction we will use. 

5.1 Our construction 

Let aoo4 ; ^400) a oi3 ; ai03) ^301) oo 22 , a 2 o 2 , «ii 2 , a 2 n be nine arbitrary positive^ rational numbers such 
that 

2aoo4 + CJ400 + 2aoi3 + 2aio3 + 20301 + ao 22 + 2a 2 Q 2 + 2an 2 + a 2 u = 1 



'The hypothesis that each aijk is not zero is made only for convenience (all the bounds presented in this paper are obtained 
using positive values for these parameters). More specifically, this hypothesis is used only when approximating quantities like 
(a,ijkN)\ using Stirling's inequality. Without the hypothesis it would be necessary to treat the (trivial) case aijk ~ separately. 
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and 

0013^2020112 = 010300220211- 

Let us define rational numbers Aq, A\, A2, A3, A4, Bo, B\, B2, B^,B^ as follows: 

Aq = 2aoo4 + 2aoi3 + 0022 

Ax = 2aio3 + 2an2 

A2 = 20202 + 0211 

-43 = 2a 30 i 

A4 = 0400 

-Bo = 0004 + 0400 + 0103 + 0301 + 0202 

B\ = 0013 + 0301 + an2 + 0211 

B2 = O022 + O202 + Oll2 

B3 = aoi3 + 0103 

B4 = O004- 

Note that Eto A i = E?=o B i = 1 - 

It will be convenient to define six additional numbers 0040, 0031, 0130, 0310, 0220 an d 0121 as 0040 = 
0004, a 3i = a i3, 0130 = aio3, a 3 i = a 30 i, a 2 20 = 0202 and a m = a U2 . We can then rewrite 
concisely the Aj's and the Bj's as follows. 

Ai = ^ a ijk fori = 0,1,2,3,4 
0<;',fc<4 

i+j+k=A 

B i = a v k forj = °' 1 ' 2 ' 3 ' 4 

0<j,fc<4 
i+j+fc=4 

Let iV be a large enough positive integer such each iVa^fc is an integer. We rise the construction 
F q ® F q described in Sectionglto the iV-th power. Observe that (F q ® F' q )® N < (F q ® F g ) 0Ar and 

(F g ®FX N = Yl TlJK > 

UK 

where the sum is over all triples of sequences UK with I, J,K £ {0, 1, 2, 3, 4}^ such that J| + Ji + 
JQ = 4 for all £ G {1, . . . , N}. Here we use the notation T IJK = Ti lJlKl ® • • • <g) T In j nKn . Note 
that there are 15^ terms T/xr- in the above sum. In the tensor product the number of x-variables is 
(q + 2) 2N . The number of y-variables and z-variables is also (q + 2) 2N . Remember that in the original 
construction, each x-variable was indexed by a superscript in {0, 1, 2, 3, 4}. Each x-variable in the tensor 
product is thus indexed by a sequence of N such superscripts, i.e., by an element / £ {0, 1, 2, 3, 4}^. 
The same is true for the y-variables and the z-variables. Note that the x-variables appearing in Tjjk 
have superscript /, the y-variables appearing in Tjjk have superscript J, and the z-variables appearing 
in Tjjk have superscript K. 

Let us introduce the following definition. 

Definition 5.1. Let aoo4, 0040, 0400. 0013, 0031, aio3» 0130, 0301, 0310, 0022, 0202, 0220* 0112, am, 0211 
be fifteen nonnegative rational numbers. We say that a triple UK is of type [ajjfc] if 

\{£e{l,...,N}\I e = i,Ji=j and K e = k}\ = a ijk N 

for all 15 combinations of positive i, j, k with i + j + k = 4. 

With a slight abuse of notation, we will say that a form Tjjk is of type [aijk] if the triple UK is of 
type [aijk]. 
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5.2 The first step 

We set to zero all x- variables except those satisfying the following condition: their superscript / has 
exactly ^4o^ coordinates with value 0, A\N coordinates with value 1, A2N coordinates with value 2, 
A 3 N coordinates with value 3 and A4N coordinates with value 4. We will say that such a sequence / is 
of type A. There are 



N 




A N,...,A 4 Nj \ N2 \ Aq° Af 1 Ap A$ 3 A^ 



sequences / of type A (the approximation is done using Stirling's formula). After the zeroing operation, 
all forms Tijk such that / is not of type A disappear (i.e., become zero). 

We process the y-variables and the z-variables slightly differently. We set to zero all y-variables 
except those satisfying the following condition: their superscript J has exactly BqN coordinates with 
value 0, BiN coordinates with value 1, B%N coordinates with value 2, B 3 N coordinates with value 3 
and B4N coordinates with value 4. We will say that such a sequence is of type B. There are 



N \ „ / 1 / 1 \ 




B N, B 4 Nj " 1 N 2 \B*°B^BS*B* s Bf* 



sequences J of type B. Similarly, we set to zero all z-variables except those such that their superscript 
K is of type B (there are Ty such sequences). 

After these three zeroing operations, the forms Tijk remaining are precisely those such that / is of 
type A, J is of type B, and K is of type B. Equivalently, the forms remaining are precisely the forms 
Tijk that are of type [aijk] with fifteen numbers a^k (for all fifteen combinations of positive i, j, k such 
that i + j ; + k = 4) satisfying the following four conditions: 

a ijk N G {0,1,... ,N} for alii, j,fc; (4) 
Ai= fori = 0,1, 2, 3, 4; (5) 

j,k : i+j+k=4 

B j= Yl for j = 0,1,2,3,4; (6) 

i,k : i+j+k=A 

B k = Yl ^ k for fc = 0,1, 2, 3, 4. (7) 

i,j : i+j+k=A 

Let / be a fixed sequence of type A. The number of non-zeros forms Tijk with this sequence / as 
its first index is thus precisely 

X r r^, V«004iV, a 040 N, a i3-/V, a 031 N, ao22-/V/ V a i03^V, ai 30 N, a U2 N, a 12 \N 

[ a ijk\ 

A 2 N \/ A 3 N \/A 4 N 

a202N,a 2 2oN,a 2 iiN J \a 301 N,a 3 i N J \a AO oN 

where the sum is over all the choices of fifteen parameters Safe's satisfying conditions ©-fT]). 

For a fixed sequence J of type B, the number of non-zeros forms Tijk with this sequence J as its 
second index is 



«Y = E 



B N \ ( B\N 

a 00i N, awoN, a W3 N, a 30 iN, a 2 Q2 NJ \~a~013, a 3W , am, chuN 



B 2 N \/ B 3 N \/B 4 N s 

K a 22N,a22oN,ai2iNj \a 031 N,ai 30 NJ \a040-W, 
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where the sum is again over all the choices of fifteen parameters a^'s satisfying conditions ©-(Q. 
Similarly, for a fixed sequence K of type B, the number of non-zeros forms Tjjk with this sequence K 
as its third index is 



Hz 



EL 



B n N 



a AoN, a^ooN, ai 30 N, a 3 i N, a 2 2oNJ \a 3i, 0301,0121, a 2n N 



B 2 N 



f B 4 N 

ao22N,a 2 o2N,a 1 i 2 Nj \a 01 sN,a W 3Nj V«004^ 



The total number of remaining triples is 

T X H X = T Y M Y = T Y N Z . 



(8) 



Note that this implies that Wy = Hz- 

We will also be interested in the number of remaining forms Tjjk of type [oyjfc]. For a fixed se- 
quence I of type A, the number of non-zeros forms Tjjk of type [a^] with this sequence / as its first 
index is 

Mi = ( A ' N )( AlN ) x 

\a004N, CL004N, a i 3 N, a i 3 N, a 02 2^V J \a103N, ai 03 iV, an 2 N, a 112 N J 



A 2 N \ ( A 3 N 

0202^,0202^, a 2 iiNj \a301-W, a 30 iiV 



J \a400NJ ' 



For a fixed sequence J of type B, the number of non-zeros forms Tuk of type [a^] with this sequence 
J as its second index is 



H 



Y 



B N \ / B X N 

a 004: N, a i00 N, a W3 N, a 30 iN, a 2 02NJ V«013, a 30 i, a 112 , a 2 nN 

B 2 N \ f B3N \ ( B 4 N \ 



We have 



O022iV, 0202^, a 112 Nj Vaoi3^, CL103NJ \a004NJ ' 
T X H X = T Y H Y - (9) 



5.3 Approximation 

In this subsection we will use the notation [oyfc] to represent an arbitrary set of fifteen parameters 
such that < a^-fc < 1 for each i, j, k. Let c([ayfc]) denote the number of nonzero elements among these 
fifteen parameters. Consider the following expression: 

n (\-TT 1\ /77a004 7 7ffl040 7 7a400 77«013 7 7a03l77ffll03 7 7ai30 77 ^a30l77ffl310 7 7a022 7r 5:20277a22077ail2 77ai2l77a211 N 

y\[ a ijk\) — ^«004 u 040 u 400 a 013 u 031 u 103 u 130 a 301 u 310 u 022 a 202 u 220 u 112 u 121 u 211 

Using Stirling's formula, we can give the following approximations. 

/ 



H 



X 



e 



E 



[A^A^A^A^A^ x g([a ijk \) 



_/\rMK/fc])-5)/2 



H 



Y 



e 



/ [B*>B*B*Bl'B* x g([a ijk )) 



H* x 
Hp 



E 



91 w 



N ■ 



_/V(c([a ijfe ])-5)/2 

A^A^A^A^At x g([a ijk \) 



B^B^B^BpB^ x g([a ijk }) 
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The first two sums are again over all the choices of fifteen parameters a^'s satisfying conditions ©HQ. 
Note that, for any [a^k] satisfying these four conditions, c([a^fc]) > 5, since the Aj's are non-zero. 

We know that Mx < A6c an d M Y < My, by definition. The following proposition shows that Mx 
and My can actually be approximated by M x and M Y , respectively. 

Proposition 5.1. Mx = 0(N 8 M x ) and My = 0(N s M Y ). 

Proof. Any set of values that satisfies conditions (HI)-© is such that aoo4 = 0004, 0040 = a 040 an d 
^400 = O4oo- Moreover, the other values a^ depend only on aio3, ab3i an d «3oi : 



«013 


= B 3 N - aio3 


«130 


= B 3 N - a 03 i 


«310 


= A 3 N - a 30 i 


«022 


= (Ao - 2B 4 - B 3 )N + ai 03 - a 03 i 


«202 


= (-A4 + B Q - B 4 )N - a 103 - 0301 


«220 


= {-A 3 -A 4 + B -B 3 - B 4 )N + a 03 i + a 30 i 


am 


= (-A + A 4 - B + B 2 + B 3 + 3B 4 )N + a 03 i + a 30 i 


am 


= (-A + A 3 + A 4 - B + B 2 + 2£ 3 + 3B 4 )N - a W3 - a 301 


0211 


= (A 2 + A 3 + 2A 4 - 2B + B 3 + 2B 4 )N - a 031 + a W3 . 



Note that there are at most (N + 1) choices for each 0103, ao3i an d «3oi> from condition (0]). There are 
thus at most (N + l) 3 choices for the values [a^k] satisfying conditions ©-©. 

We now show that the expression g([aijk]) is maximized for the values [a^k] = [o-ijk]- Let us take 
the logarithm of the expression g. Since this is a concave function on a convex domain, a local optimum 
of log g is a global maximum of g. The partial derivatives of log g are as follows. 

log(a i 3 ) - log(aio3) - log(a 022 ) + log(a 202 ) + log(ai 2 i) - log(a 2 n) 
log(a 03 i) + log(ai 30 ) + log(a 022 ) - log(a 22 o) - log (am) + log(a 2 n) 
log(fl30i) + log(a3io) + log(a 202 ) - log(a 220 ) - log(am) + log(a m ) 



dlogg 

9aio3 
d\ogg 

da 3i 
d\ogg 

da 3 oi 



The values [a ijk ] = [a ijk ] satisfy = |^ = |^ = since a i3a 202 am = ai 3ao22«2H 

We conclude that 

Nx = f(N + l) 3 [a^A^A^A^A^ x g([a ijk }) 
and similarly My = O (N 8 M Y ). 



N 



O (N S M X ) 



□ 



5.4 The second step 

We will now apply the results of Section HI by associating to U the set of all forms Tjjk that are of 
type [aijk] f° r values ayj. satisfying conditions ©-(17]), and to U* the set of all forms Ttjk of type 
[aijk]- Note that all the conditions of Subsection 14. II are satisfied: we have r = 4, and values T\ = T x , 
T 2 = T 3 = T Y , Mi = M x , M 2 =M 3 = M Y , M* = M* x and A" 2 * = A/" 3 * = Ay. 

With this association we can use the graph-theoretic framework developed in Section |4] The initial 
graph G, as defined in Subsection 14.11 corresponds to the current sum of trilinear forms (each vertex 
corresponds to a form Tjjk of type [a^k] for values oyj. satisfying conditions ©-©). A removal 
operation on G corresponds to zeroing variables with a given superscript. For instance, removing all 
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vertices u G G such that fi (u) = I corresponds to zeroing all the x-variables with superscript /. Our 
goal is to zero variables in order to obtain a sum of several forms Tjjk satisfying the following two 
conditions. First, each form in the sum is of type [a^]. Then, the forms in the sum do not share any 
index (i.e., if Tjjk and Tpjij^i are in the sum, then I ^ I' , J ^ J' and K / K'), which implies that the 
same variable does not appear in more than one form, and thus that the sum is direct. This corresponds 
to constructing an edgeless subgraph of G in which all the vertices are in U* , so we are in a situation 
where Theorem I4.2l can be applied. 
Suppose that the inequality 

Aq° Af 1 A% 2 A3 3 Af 4 > B^B^B^B^Bf 4 

holds. Then T x = 0(T Y ), and Equalities ® and © imply that M Y = 0{M X ) and M Y = 0{M X ). 
From Proposition 15. 1 1 we then obtain the relation Mi + M2 + A/3 = 0(N 8 N x ). By the above discussion 
and Theorem 14.21 we can obtain a direct sum of 



n 



( T X N X 



forms, all of type [a^]. By using the trivial upper bound M x < 15 , we obtain the following theorem. 

Theorem 5.1. Let q be any positive integer and aoo4, 0400 0013, 0103, 0301. a 022, 0202. 0112 and 0211 be 
any nine positive rational numbers satisfying the following three conditions: 

• 2aoo4 + CJ400 + 2aoi3 + 2aio3 + 2a3oi + 0022 + 2a202 + 2an2 + 0211 = 1/ 

• 001302020112 = 010300220211/ 

• A^A^A^A^Af 4 > Bq° 1? f 1 B2 2 Bf 3 Bf 4 . 

Then, for any constant e > 0, the trilinear form (F q ® F q )® N can be converted (i.e., degenerated in the 
sense of Definition I2.il ) into a direct sum of 



n 



1 



aA aAi aA 2 aA 3 aAa 
^0 ^1 A 2 ^3 ^4 



forms, each form being isomorphic to 



ijk 



N 



0<«J,fc<4 
i+j+k=i 



6 Upper Bounds on the Exponent of Rectangular Matrix Multiplication 

Theorem [5j] showed how the form (F q (g) F q )® N can be used to obtain a direct sum of many forms Tjjk 
such that 

Tuk = <g) T^ N . 

i,j,k:i+j+k=4 

In order to apply Schonhage's asymptotic sum inequality (Theorem 12. lb . we need to analyze the smaller 
forms Tjjfc. All the forms except T112, Im and T211 correspond to matrix multiplications, as described 
in Section[3] In Subsection 16. II we analyze the forms T112, Tm and T211. Then, in Subsection 16.21 we 
put all our results together and prove our main result. 
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6.1 The forms T 112 , T 121 and T 211 

Let us recall the definition of these three forms. 

g g g g 

Tll2 = X i,0yi,0 Z 0,q+l + ^2 x 0,ky0,k z q+l,0 + ^2 X i,0y0,k z i,k + ^ X 0,kUi,O z i,k 

i=l k=l i,k=l i,k=l 

g g g g 

Tl21 = ^,02/0,9+1^,0 + 5^ X 0,fcyg+l,0 z 0,fc + ^ X \fiyi,k z 0,k + x 0,kyf,k z i,0 

j=l fc=l i,k=l i,k=l 

g g g g 

T2U ~ ^2 X 0,g+iyi,0 z i,0 + ^ x q+l,oyo,A; z O,fc + x l,kyi,0 z 0,k + x i,kyO,k z i,Q 

i=l k=l i,k=l i,k=l 

We first focus on the form I211. It will be convenient to write T 2 n = ton + *ioi + £110 + *200> where 

g 



ton = ^J^o, 9 +i2/i,o^i,Oi 
i=i 
g 

hoi = ^2 x \,kyo,k z i,Oi 

i,k=l 

g 

hio = ^2 x \kylfi z o,ki 

i,k=l 

g 

E2 

x q+l,0y0,k z 0,k- 



fe=l 

Note that the superscripts in these forms differ from the original superscripts. They are nevertheless 
uniquely determined by the subscripts of the variables. Observe that ion — ^200 — (lj !><?)> which 
corresponds to the product of a scalar by a row vector, and hoi = hio — {q, Q, 1)> which corresponds to 
the product of a q x q matrix by a column vector. 

The following proposition states that tensor powers of T 2 u can be used to construct a direct sum 
of several trilinear forms, each one being a 'rf -tensor in which the support and all the components are 
isomorphic to a rectangular matrix product. 

Proposition 6.1. Let b be any constant such that 0.916027 < 6 < 1. Then there exists a constant c > 1 
depending only on b such that, for any e > and any large enough integer m, the form T^ 2 " 1 can be 
converted into a direct sum of 



1 



mc 2em 



1 2ttl 



(2b) b {l-b) 1 - b 

trilinear forms, each form being a & -tensor in which: 

• each component is isomorphic to (q 2bm , q 2bm , q 2 ( 1-b ) m ); 

• the support is isomorphic to supp c ((l, 1, H)), where H = Vt • [(26) 6 (1 — 6)( 1-6 )] 2m 

Remark. Proposition 16 . 1 1 uses the convention 0° = 1. For the case 6 = 1, the proposition states that the 
form T^ 2 ,™ 1 can be used to construct at least one tensor with support isomorphic to supp c ((l, 1, H}) for 
H = Q (4 m /\/m), each component being isomorphic to (q 2m , q 2m , 1). 

Proof of Proposition \6. 1 1 For simplicity we suppose that bm is an integer (otherwise, we can work with 
[bm\ , which gives the same asymptotic complexity). 
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LetS" be the set of all triples UK with I G {0,l,2} 2m and J,K£ {0, l} 2m such that I t +J t +K t 
2 for all £ € {1, ... , 2m}. We rise the form T%\\ to the 2m-th power. This gives the form 



E 

ijkgs 



tux 



where = ti 1 j 1 K 1 <8> • • • ® ^Ii m J%mKi m - Each x-variable in the tensor product has for superscript 
a sequence in {0, 1, 2} 2m , and each y-variable or z-variable has for superscript a sequence in {0, l} 2m . 
Let us decompose the space of x-variables as ©/ e | i 2 } 2m ^-i, where Xj denotes the subspace of x- 
variables with superscript I. Similarly, decompose the space of y-variables as ©j e | nam Yj and the 
space of z-variables as ©^- e | nam Zk- The form ^ tijK above is then a ^-tensor with respect to this 
decomposition, with support S. We will now modify this form (by zeroing variables, which will modify 
its support) in order to obtain a simple expression for each of its components. 

We zero all the x- variables except those for which the superscript I has (1 — 6)m coordinates with 
value 0, (1 — 6)m coordinates with value 2 and 26m coordinates with value 1. We zero all the y-variables 
and z- variables except those for which the superscript has m coordinates with value and m coordinates 
with value 1. After these zeroing operations, the only forms remaining in the sum are those corresponding 
to indexes UK satisfying the following four conditions. 



{*€{!, 

{le{l, 

{*€{!, 



• , 2m} I h 

• , 2m} | h 
■ , 2m} | I e 

• , 2m} | h 



0,J e 
2,J e 



1 and K e = 1} 

and K t = 1} 

1 and K e = 0} 
and K t = 0} 



(1 — b)m 

bra 

bm 

(1 — b)m 



This means that each form tjjjc in this new sum (i.e., each component in the corresponding ^-tensor) is 
isomorphic to 



.(gi(l-b)m ,®6 m ,®fe m ,(g)(l-fe)m 
c 011 59 r 101 59 r 110 69 r 200 



/26m 2bm 2(l-b)m\ 
W j " j " / 



We now analyze the support of the new sum (the decomposition considered is unchanged). 

The case 6 = 1 can be analyzed directly: there are ( 2 ^) = (4 m /y / m) forms in the sum, all of 
them sharing the same first index (the all-one sequence 1 = 1 • • • 1). This sum is then J2j tiJK, where 
for each t\jK the sequence K is uniquely determined by J. The support of the sum is thus isomorphic 

to supp c (<l, 1, ( 2 3). 

To analyze the case 6 < 1, we will interpret the sum in the framework developed in Section |4l by 
letting U be the set of triples UK satisfying the above four conditions. Indeed, all the requirements for 
U are satisfied: we have r = 2 and 



Tl 

M 

T 2 =T 3 
A/2 = A/3 



2m 

(1 — 6)m, (1 — 6)m, 2m6 



6 



1 



??? 



(26)>(1 - b) 



l-b 



2m N 



2m6 
m6 
2m 
m 



= G 

e 



1 



2m 



m 



[2] 



2)ii 



m \ / m 
m(l -b)J \m(l - 6) 



e 



in 



66(1-6)1-6 



2tyi 



Note in particular that Ni > N 2 , since (26) fe (l - b) 1 ^ > 1 for any 6 > 0.773. Choose U* = U (which 
means that Mi = N* for each i G {1,2, 3}). The correspondence with the graph-theoretic interpretation 
of Section |4] is as follows. Each vertex in the graph G defined in Subsection 14. II corresponds to one 
form Tjjk in the sum, which is isomorphic to (q 2bm ,q 2bm , q 2 ^~ b > m ) from the discussion above. One 



28 



removal operation corresponds to zeroing variables. A 1 -clique of length n corresponds to a sum of n 
forms sharing their first index, which is precisely a "^-tensor with support isomorphic to supp c ((l, 1, n)). 
For any value b > 0.916027 we have 



{2b) 2b {\ - b) 2 ^~ b ) 



< 1 



and thus T1A/2/A/1 = o(l). By Theorem 14.31 for any e > we can then convert the sum into a direct 
sum of 

2 



9 



1 



m 



l-e 



2 b (b b (i-by- b ) 



l-iA 1 " 6 



2m \ 



-tensors, each tensor having support isomorphic to supp c ((l, 1, n)) for 



n = Q. 



M 2 



n 



m ■ 



(2b) b (l 



2 m 



and components isomorphic to (q 2bm , q 2bm , q 2 ^ 1 b ) m ). Finally, note that 



m 



l-e 



2 b (b b {i-by- b ) 



2(» 



> 



mc 



2em 



(2b) b {i - by- b 



2m 



for some constant c > 1 depending only on 6, since b b (l — b) 1 < 1. 



□ 



The forms T\\2 and T\i\ can be analyzed in the same way as T^w by permuting the roles of the 
x-variables, the y-variables and the z-variables. Similarly to the statement of Proposition 16.11 the form 
TfyP gives a direct sum of ^-tensors with support isomorphic to (1,H,1), each component in the 
tensors being isomorphic to (q 2bm , q 2 0-~ b ) m ; q 2bm ). The form T^ 2 ™ 1 gives a direct sum of "^-tensors with 
support isomorphic to supp c ((.ff, 1, 1)), each component being isomorphic to (g 2 ( 1_b ) m , q 2bm , q 2bm ). 

Suppose that different constants are used to treat each of the three forms: the forms T112 and T\i\ are 
processed with some constant b, while T211 is processed with another constant b. For any fixed values 
a n2 , a 2 n and any e > 0, the form T®% 112N ® T^ 112 
sum of 



1 Tf^ 211 N can then be used to construct a direct 



1 



N 3jNe 



(2b) b (l-b) 



1-6 



2an 2 AT 



(26)6(1 - b) 



1-6 



c^iiA^ 



"rf -tensors, for some value d > 1 depending only on b and b. Each of these ^-tensors has a support 
isomorphic to supp c ((-ffn2, #112 > H211)), where 



H\\2 

H211 



9 
9 



1 



2V 



2V 



(26) b (l - &)M 
(26) S (1 - 6) (1 ~ B) 



a\\2N 



a 2 i\N 



In all these 'rf -tensors, each component is isomorphic to the rectangular matrix multiplication 



^(aii2+a2iif>)7V ^(0112+02116)^ ^(2aii2&+a 2 ii(l-o))A^ 

We can then use Propositions 12.21 and 12 . 3 1 to convert each ^-tensor into a direct sum of at least f -Km x 

min(#n2, H 2 ii) trilinear forms, each isomorphic to (gtem+^u^iV q (ai 12 +a2ub)N^ q (2au2b+a 2 ii(l-b))Ny 
We thus obtain the following result. 
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Proposition 6.2. Let a\\2 and 0211 be any two positive constants. Let b and b be any two constants such 
that 0.916027 < b, b < 1. Then there exists a constant d > 1 such that, for any e > 0, the form 



j>(g)aii2-/V ^ rp®a^± 2 N ^ rp®a 2 iiN 



L 211 



can be converted into a direct sum of 
( 



n 



1 



N i c >Ne 



V 



2 2a 112+(l211 



max ( [(26)^(1 - b) 



l-b-|1112 



(26) 6 (l-6) 



1-6 



«211 



/ 



/orms, each form being isomorphic to ( q ( a n2+a 211 b)N ^ q (a 112 +a 211 b)N ^ q (2a 112 b+a 211 (i-b))Ny 

6.2 Main theorem 

Let us define the following three quantities. 

Q = (2g) ai °3+ a 3oi x (q 2 + 2) a202 x g a n2+ a 2nb 



R 
M 



(2g) 2a ° 13 x (a 2 + 2) a ° 22 x g 2a n2 fe +(i _J, ) a 2ii 

22aii2+a2ii 



A^A^A^A^A^ max f [(26)6(1 - 6)i-°p 12 , f(26) 5 (l - 6) 



\\-b 



0.211 



Our main theorem gives an upper bound on cj(1, 1, k) that depends on these quantities. 

Theorem 6.1. Let q be any positive integer and b, b be such that 0.916027 < b,b < 1. Let aoo4, 0400, 
a 0i3> ^103. 0301, ao22» 0202, ^112 and 0211 be any nine positive rational numbers satisfying the following 
three conditions: 

• 2aoo4 + «400 + 2aoi3 + 2aio3 + 20301 + 0022 + 2a202 + 2an2 + 0*211 = 1/ 

• ^01302020112 = 010300220211,' 

• A^Af^A^A^Af 4 > Bq° Bf 1 B^ 2 Bf 3 Bf 4 , 



Then 



Proof. Let e > be an arbitrary positive value. Let iV be a large integer and consider the trilinear form 
(F q <g) F q )® N . Theorem 15.11 shows that this form can be used to obtain a direct sum of 



n = Sl 



1 



jYl0+8e 1 5Ar < E 



forms, each isomorphic to 



aAo aAi aA 2 aA 3 aA 4 
^0 ^1 ^2 ^3 ^4 



i,jr',fe: i+j'+fc=4 



All the terms in this form, except T\\2, X121 and T211, correspond to matrix multiplications and 



have been analyzed in Section [3] By Proposition 16.21 the part Tf 12 
used to obtain a direct sum of 



'($0,121 iV 



®an 2 N 
112 



r 2 = n 



1 



2 2a 112+0.211 



T®<? llN can be 



iv 



max 



[(2bf(l-b) 



1-610112 



(26)6(1 - 6) 



1-6 



(1211 



30 



matrix multiplications (g( a n2+«2lifc)A^ q (a 112 +a 211 b)N ^ q (2a 112 b+(l-b)a 2 ii)N^ 

This means that the trilinear form (F q (g) F q )® N can be converted into a direct sum of r\T2 matrix 
multiplications (Q , Q N , R N ). In other words: 

rir 2 • (Q N , Q N , R N ) < {F q <g> F q )® N . 

Since R (F q ® F q ) < (q + 2) 2 , as mentioned in Section [3j we know that R ((F q ® Fg)®^) < 
(g + 2) 2Af . By Schonhage's asymptotic sum inequality (Theorem 12. II ) we then conclude that 



T\r-i x Q y ■ < (g + 2) . 



Taking the iV-th root, we obtain: 



xXQ^ 1 ' 1 '^' < (g + 2) 2 . 



For any e > the above inequality holds for large enough integers N. By letting N grow to infinity, and 
then letting e decrease to zero, we conclude that MQ uj{hl, °°^ ) < (q + 2) 2 . □ 

7 Optimization 

In this section we use Theorem 16.11 to derive numerical upper bounds on the exponent of rectangular 
matrix multiplication, and prove Theorem 11.11 

7.1 Square matrix multiplication 

In this subsection we briefly show that our results give, for the exponent of square matrix multiplication, 
the same upper bound as the bound obtained by Coppersmith and Winograd ifTUl . 

Due to the symmetry of square matrix multiplication, we take 6 = 6, 0400 = aoo4> ^103 = a oi3 = 
0301, «022 = 0202 and 0112 = 0211- Then only six parameters remain, and the conditions 00130202^112 = 
010300220211 and Aq° Af 1 A Ai A^ 3 Af 4 = B^B^B^B^B^ 4 are immediately satisfied. 

Theorem 16. II shows that 

8 ai12 x [(2o) 2a ° 13 x (q 2 + 2) a2 ° 2 x gCM-^iw]^ 1 ' 1 ' 1 ) 

A^A^A^A^A* 4 x [(26)^(1 -6)(!- b )] ai12 ~ {q + 2) ' 

By choosing q = 6, 6 = 0.9724317, ai 03 = 0.012506, a 20 2 = 0.102 5 46, am = 0.205542 and 
a 004 = 0.0007/3, we obtain the upper bound 1, 1) < 2.375477. 

This upper bound on the exponent of square matrix multiplication is exactly the same value as in iPTOl . 

obtain 

^(1,1,1) gW(l,l,l) +2 

(26) 6 (1 - 6)M = 2 

and our inequality becomes 

(2g) 2a o i3W ( 1 ' 1 ' 1 ) x (g 2 + 2) a202W ( 1 ' 1 ' 1 ) x (4g w ( 1 ' 1 ' 1 )(g a '( 1 ' 1 > 1 ) + 2)) a 

aAq aA 1 aA 2 aAz aA 4 
^0 ^1 ^2 ^3 ^4 

which is exactly the same optimization problem as in Section 8 of IfTUl . 



This is not a coincidence. Indeed, by setting 6 = n 5(i,i',i') +2 ' which is larger than 0.916027 for q > 5, we 



112 



<(« + 2) 2 , 
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q 


5 


5 


6 


b 


0.984599222 


0.968978515 


0.94866036 


b 


0.919886704 


0.938616630 


0.99996514 


^400 


0.004942000 


0.001498500 


0.00000090 


ai03 


0.010965995 


0.014456894 


0.01553556 


^301 


0.055710210 


0.031215255 


0.00079349 


O022 


0.037622078 


0.065869083 


0.22704392 


^202 


0.138698196 


0.118190058 


0.05836108 


aii2 


0.145715589 


0.178553843 


0.20388121 


«211 


0.245013049 


0.226835534 


0.13394891 


^004 


r\ r\r\r\ 1 1 

0.00011... 


0.000246... 


r\ f\f\ 111/1 

0.001224... 


«013 


0.00500... 


0.010235... 


0.039707... 


aAq aAi aA 2 aA 3 aA 4 
^0 A l A 2 A Z ^4 


0.326588... 


0.3265988... 


0.339123647... 


tdBo TjBl R B 2 R-B3 R^4 

-°0 -°1 -°2 n 3 n A 


0.326587... 


0.3265987... 


0.339123642... 


log i?/ log Q 


0.530200005... 


0.750000001... 


2.00000004... 


(21og(g + 2)-logX)/lo g g 


2.060395... 


2.190086... 


3.256688... 



Table 2: Three solutions for our optimization problem. The first ten rows give (exact) values of the ten 
parameters. The numeral values of the next four rows show that the three conditions < aoo4, «oi3 < 1 
and Aq° Af 1 A^ 2 A3 3 Af 4 > B^B^B^B^B^ 4 are satisfied. The numerical values of the last two 
rows show that u(l, 1, 0.5302) < 2.060396, w(l, 1, 0.75) < 2.190087 and oj(1, 1, 2) < 3.256689. 



7.2 Rectangular matrix multiplication 

In this subsection we explain how to use Theorem 16. 1 1 to derive an upper bound on co(l, l,k) for an 
arbitrary value k, and show how to obtain the results stated in Table Q] and Figured] 

We use the following strategy. We take a positive integer q, seven positive rational numbers 0400, 
a i03> O30i> O022, «202, «H2 and 0211, and two values b, b such that 0.916027 < b, b < 1. We then fix 

«103«022«211 

aoi3 — 

02020112 

and 

1 - (0400 + 2a i3 + 2aio3 + 2a 30 i + a 22 + 2a 2 o2 + 2an 2 + a 2 n) 
aoo4 — • 

The conditions that have to be satisfied are: 

• < a o4,«oi3 < 1; 

• A^A^A^A^Af 4 > Bq° i?f 1 B2 2 B3 3 .Bf 4 . 

If these conditions are satisfied, by Theorem 16. ll this gives the upper bound 

i log_R\ < 21og(< ? + 2)-log.M 



logQJ logQ 

The above discussion reduces the problem of finding an upper bound on u(l, l,k) to solving a 
nonlinear optimization problem. The upper bounds presented in Table Q] are obtained precisely by 
solving this optimization problem using Maple. We show exact values of the parameters proving that 
w(l, 1, 0.5302) < 2.060396, w(l, 1, 0.75) < 2.190087 and w(l, 1, 2) < 3.256689 in Tabled 



32 



7.3 The value a 



In this subsection we describe how to use Theorem l6.1l to obtain a lower bound on the value a, the largest 
value such that the product of an n x n a matrix by an n a x n matrix can be computed with 0(n 2+e ) 
arithmetic operations for any e > 0. The analysis is more delicate than in the previous subsection, since 
we will need to exhibit parameters such that M.Q 2 = (q + 2) 2 , with an equality rather than an inequality, 
and is done by finding analytically the optimal values of all but a few parameters. 

Let q be an integer such that q > 5. For convenience, we will write k = l/(g + 2) 2 . Let a\\2 and 
02ii be any rational numbers such that < a%\2 < and < 0211 < (q 2 + 2)k. We set the parameters 
b, b, a Q4, «103> O202 and a 30 i as follows: 



b 
b 

CJ400 
a 103 

«202 
a 301 



1 

K 

qn - au2 

({q 2 + 2)k - 0211) /2 
qK. 



Putting these values in the formula for Q, we obtain: 



Q 



(2ff) 



(}K+aio3 



(, 2 + 2) 



(^+2)K-a 21 i 



1 aii2+g 2 a 2 n/(g 2 +2) 



(2q) 2qK x (g 2 + 2)—^- x 2~ ai12 x 



7 ?7(<7 2 +2) 



1211 



V7+2 



Observe that Ai = A 3 = 2^k, A 2 = (q 2 + 2)«, At = k and A = 1 - (A + A 2 + A 3 + A4) 
Then we obtain the following equality. 



K. 



1 



(9 + 2) 



A^A^A^A^Af 4 (2q)^(q 2 + 2)(9 2 +2)« 

The following lemma shows that, when a\\i is small enough, the condition M.Q 2 = (q + 2) 2 is satisfied. 
Lemma 7.1. Suppose that 



au2 < 1 + 



2<Z 2 



Q 2 + 2 



log 2 (<?) - log 2 (g 2 + 2) a 2 n 



(10) 



TTien .A/fQ 2 = (g + 2) 2 . 

Proof. Our choice for 6 and 6 gives 

(2b) b (l - b) 1 ^ 



1112 



1211 



)1ll2 



(26) 6 (1 - 

Inequality GS) then implies that [(26) b (l - fr) 1 ^]" 112 



+ 2 



2g- 
(9^+2 



2 -1 1211 



< 



M 



(2q)^(q 2 +2)(9 2 + 2 ) K 
which gives A^Q 2 = (g + 2) 2 . 



x 4 ai12 x 



(26) 6 (1 - fc) 1 ^ 
q 2 + 2 



^211 



. In consequence, 



1211 



□ 
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We now explain how to determine the three remaining parameters aoo4, «oi3 and 0022- Remember 
that the parameters should satisfy the equalities 

Oa.03&211 



O013 



«202ail2 



-OQ22 



and 



2aoo4 + ^400 + 2aoi3 + 2aio3 + 20301 + 0022 + 20202 + 2an2 + 0211 — 1- 
From our choice of parameters, the second equality can be rewritten as 2aoo4 + 2aoi3 + 0022 = «■ Since 
the parameter aoo4 should be positive, we obtain the condition 

4(qK - a n2 )a 2U . ) 



/, > , .y> • 1 a 022 < «• 

((q z + 2)K - a 2 n)aii2 / 

If ao22> aii2 and 0211 satisfy this inequality, then the parameter aoo4 is fixed: 

A(qn - a 112 )a 2 n 



O004 



+ 1 a 022 /2. 



((<2 2 + 2)k - 02ii)aii 2 
Note that Inequality (fTTT) forces the value 0013 to be at most 1. 

All the values are thus determined by the choice of q, ao 22 , a\\ 2 and 0211- In particular, we obtain 



4(qrre-a 112 )n211 

R = (2g)«9 2+2 ) K - a 2ii)«ii2 x (<? 2 + 2} 



ao22 2aii2H — 3—70211 

x a 9 +2 



We can similarly express the values of Bq, B\, B 2 , B% and B4 in function of these four parameters. We 
then want to solve the following optimization problem. 



Maximize ]^|^ subject to 

• < ao22 < 1, < a\i2 < 5k and < 0211 < (q 2 + 2)«; 

• q is an integer such that q > 5; 

• Inequalities dTOj) and £0} hold; 

(29) 4 ^( t? 2 +2)( 92 + 2 )" > R B R Bi D-B2 R B 3 R B4 
(i+2p - -°1 -°2 -°3 -°4 " 



By taking the values q = 5, a 02 2 = 0.0174853, aii 2 = 0.0945442 and a 2U = 0.1773724, we obtain 
the value a > i^tq > 0.30298. These parameters satisfy all the constraints. We obtain in particular the 
following numerical values. 



(2q)^ K (q 2 +2)(9 2+2 ) K 



0.3211277. 



(9 + 2) 2 

Bq° Bf 1 B2 2 -&f 3 Bf 4 = 0.3211276.... 

R = 1.475744... 
Q = 3.612672... 

A more precise lower bound on a can be found using optimization software and high precision 
arithmetic. Using Maple and truncating the result of the optimization after the 25th digit, we find that for 
q = 5 the values 

a022 = 0.0174853267797595451457284 
all2 = 0.0945442542111395375830367 
o211 = 0.1773724081899825630904504 



give the lower bound 



a > 0.3029805825293869820274449. 
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